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VOLUME  III 


This  report  has  been  prepared  by  TRW  Defense 
and  Space  Systems  Group.  The  work  summarized 
in  this  report  has  been  performed  under  Con- 
tract N00014-74-C-0068  and  was  directed  for 
the  Navy  by  Dr.  D.  F.  Barbe  and  Dr.  W.  D. 
Baker  of  the  Naval  Research  Laboratory.  The 
period  of  performance  was  from  January  1975 
to  February  1976. 
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1.  INTRODUCTION  AND  SUM, VARY 


1.1  INTRODUCTION 

In  1973,  the  Naval  Research  Laboratory  issued  a request  for  quotation  for  a study 
program  aimed  at  defining  and  analyzing  those  areas  of  application  of  charge  coupled 
devices  (CCDs)  in  signal  processing  systems.  The  broad  objective  of  the  RFQ  was  to 
initiate  a study  that  would  examine  the  impact  of  CCD  technology  on  signal  processing 
systems.  Implicit  in  such  a statement,  of  course,  is  the  requirement  to  determine 
those  areas  of  signal  processing  systems  where  the  use  of  CCDs  offers  an  economic  ad- 
vantage. The  extent  of  that  advantage,  that  is  to  say  the  impact,  can  tnen  be  pro- 
jected. Naturally,  the  projection  cannot  be  made  in  terms  of  dollars  and  cents,  but 
is  best  made  by  direct  comparison  of  identical  functions  realized  with  CCDs  and  any 
other  appropriate  technology.  Under  these  conditions,  numbers  such  as  speed,  power, 
and  parts  count  can  be  tabulated  and  cross-correlated. 

As  a result  of  the  proposal  submitted  to  the  Naval  Research  Laboratory,  TRW 

embarked  on  a study  of  the  impact  on  signal  processing  systems  of  the  use  of  CCDs  in 

the  digital  domain.  The  results  of  that  study  have  been  issued  under  the  title 

"Charge  Coupled  Devices  in  Signal  Processing  Systems;  Volume  I:  Digital  Signal 

★ 

Processing."  Briefly  stated,  the  study  indicated  that  digital  CCDs  combine  the  in- 
herent advantages  of  any  digital  technology  (such  as  high  noise  immunity,  freedom  from 
device/parameter  variations,  and  stable  operating  conditions)  with  the  advantages 
peculiar  to  CCDs  (such  as  high  density  and  low  power).  In  addition,  digital  CCDs  are 
best  suited  to  those  signal  processing  applications  where  the  signal  flow  can  be  car- 
ried out  in  a pipelining  fashion  requiring  little  or  no  feedback;  this  permits  rela- 
tively high  data  throughput  to  be  accomplished  with  the  relatively  low  CCD  clock 
frequencies.  Not  surprisingly,  the  impact  is  most  dramatic  in  those  situations  where 
a large  number  of  functions  and/or  high  computational  accuracy  is  demanded.  A large 
number  of  such  instances  occur  in  existing  and  projected  systems;  these  were  identi- 
fied and  analyzed  in  some  detail. 

At  the  conclusion  of  the  study,  TRW  recommended  that  an  experimental  verification 
be  carried  out  that  would  go  beyond  the  basic  device  work  already  accomplished  and 
would  demonstrate  the  real  advantages  of  the  approach.  The  realization  of  a digital 
CCD  fast  Fourier  transform  on  a chip  was  selected  as  a useful  vehicle;  additionally 
this  function,  properly  implemented,  is  quite  flexible  and  suited  to  a number  of  diverse 
situations.  Accordingly,  a technology  development  program  was  entered  into.  The  ob- 
jective of  the  first  phase  is  the  investigation  and  characterization  of  the  fundamental 
building  blocks  that  would  be  employed  in  a typical  application.  The  objective  of  the 


Available  from  the  National  Technical  Information  Services;  a companion  report 
"Charge  Coupled  Devices  in  Signal  Processing  Systems;  Volume  II:  Analog  Signal 
Processing"  is  also  available. 


second  phase  is  to  develop  those  building  blocks  into  larger  circuit  functions  suitable 
for  implementing  the  FFT . And  the  third  phase  is  directed  at  an  FFT  demonstration  unit 
specifically  aimed  at  a to-be-determined  application.  This  report  is  being  issued  at 
the  end  of  the  first  phase.  The  second  phase  duration  is  13  months;  the  duration  of 
the  third  phase  is  dependent  on  the  application  selected.  The  chronology  of  events  is 
summarized  in  Figure  1-1. 

1.2  PHASE  1 REPORT  SUMMARY 

This  report  contains  an  overview  of  the  entire  program  and  a brief  statement  of 
goals  and  approaches.  This  is  followed  by  a discussion  of  the  development  of  the  full 
adder  circuit  function.  The  original  concept  is  explained  and  subsequent  alterations 
to  the  original  layout  are  described;  both  two  and  three  input  adders  are  treated 
(Section  2).  There  are  some  hardware  implications  in  the  several  computational  algo- 
rithms that  can  be  used  and  these  are  examined  in  Section  3.  The  primary  test  mask 
that  was  designed  during  Phase  1 is  presented  along  with  a summary  of  the  test  results 
in  Section  4.  The  process  sequences  being  employed  to  produce  these  devices  are  ex- 
plained, and  cross-sectional  views  of  the  devices  are  given  in  Section  5.  This  is  fol- 
lowed by  a presentation  of  the  results  of  a study  made  to  determine  a method  of  inter- 
connecting a number  of  the  projected  FFT  chips  into  a single  system.  The  report 
concludes  in  Section  7 with  a recommendation  for  future  work. 


Figure  1-1.  Chronology  of  Program 
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2.  CHARGE  DETECTION  FOR  DIGITAL  LOGIC  PURPOSES 


2.1  OPERATIONAL  REQUIREMENTS 

Several  distinctly  different  ways  exist  to  perform  logic  with  CCD’s.  In  one 
approach,  two  or  more  individual  charge  packets  are  combined  into  a single  storage 
area.  Then,  by  making  measurements  on  this  result,  information  is  derived  about  the 
original  packets.  This  technique  is  useful  only  insofar  as  the  information  derivable 
from  the  addition  cannot  be  obtained  from  the  isolated  individual  packets.  As  a 
simple  example  of  this  approach,  consider  the  electrode  arrangement  of  Figure  2-1. 

This  figure  shows  the  conceptual  arrangement  for  performing  an  AND  and  an  OR  function. 
If  all  the  electrodes  can  store  an  equal  amount  of  charge,  then  when  the  contents  of 
A and  B are  simultaneously  transferred  to  the  common  gate  covering  the  confluence  of 
the  two  channels,  several  results  are  possible.  If  A and  B are  both  empty  (i.e., 

A = 0 = B)  then,  of  course,  there  will  be  no  charge  transferred  to  the  cormon  gate  and 
with  the  next  transfer  the  electrode  labeled  C will  also  contain  no  charge  (C  = Oj. 

Now  if  A or  B are  storing  charge,  C eventually  will  also  do  so.  If  A and  B both  con- 
tain charge,  clearly  C will  too,  after  the  contents  of  the  common  gate  are  emptied 
into  it.  However,  since  the  common  gate  can  hold  only  the  charge  of  either  A or  B, 
the  excess  charge  must  flow  under  D.  By  examining  the  truth  tables  of  C and  D,  it  is 
clear  that  both  an  AND  and  an  OR  function  are  generated.  Other  more  complicated  func- 
tions can  also  be  realized  in  this  manner.  The  identity  of  the  original  digital  data 
which  provides  the  A and  B values  is  lost  in  the  process  of  performing  the  function. 

I 
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This  loss  of  data  is  not  necessary,  and  other  schemes  exist  which  preserve  the 
original  data  streams.  In  order  to  perform  logic  (or  any  other  operation)  on  data 
streams  and  leave  them  undisturbed,  a nondestructive  sensing  technique  must  be  employed. 
Then  the  presence  or  absence  of  charge  may  be  detected  and  some  action  taken  as  a 
result  of  this  detection.  The  action  taken  may  involve  switching  a MOS  FET  from  one 
state  to  another  or  simply  controlling  flow  in  another  CCD  as  a result  of  this  charge 
detection  in  the  CCD  containing  the  original  data  stream.  The  floating  diffusion  and/ 
or  floating  gate  are  two  common  means  of  performing  such  detection. 

The  circuits  described  in  this  report  make  use  of  both  of  the  above  types  of  cir- 
cuit functions  to  achieve  the  overall  desired  operation;  in  some  cases  individual 
charges  packets  are  combined,  and  in  other  cases  nondestructive  sensing  is  all  that 
is  needed.  Which  technique  is  used  is  simply  a matter  of  the  best  way  to  perform  the 
task  at  hand.  The  basic  philosophy  is  to  begin  with  the  CCD  characteristics,  their 
advantages  and  limitations,  and  design  to  achieve  the  "black  box"  function  desired. 
Merely  mimicking  the  approach  of  a similar  function  constructed  using  other  than  CCD 
technology  generally  produces  a cumbersome  solution. 

2.2  FLOATING  GATE  AMPLIFIER  DESIGN 

A key  element  in  the  design  of  CCD  digital  processing  systems  is  the  floating 
gate  amplifier.  The  floating  gate  amplifier  is  involved  in  the  logic  decision  pro- 
cess, and  its  proper  operation  is  essential  to  the  overall  function.  The  floating 
gate  nondestructive^  detects  the  presence  or  absence  of  charge  in  one  location  and 
transmits  and  utilizes  the  information  to  control  the  flow  of  charges  at  another 
location. 

Thus  the  floating  gate  must  be  sensitive  to  signal  charge  and  its  variations  but 
at  the  same  time  be  insensitive  to  all  parasitic  type  charges.  These  parasitic  charges 
may  be  intrinsic  to  the  processing  cycle,  or  induced  image  charges  generated  by  charges 
which  may  exist  in  the  atmosphere  or  may  be  deposited  on  surfaces.  Theoretical  and 
experimental  considerations  indicate  that  a FET  attached  to  a floating  gate  structure 
would  permit  the  proper  floating  gate  performance.  Evolution  of  the  floating  gate  am- 
plifier design  is  discussed  below. 

2.2.1  Floating  Gate  Amplifier  Design  on  LSM-1 

The  floating  gate  amplifier  configuration,  as  developed  on  LSM-1,  is  shown  in 
Figure  2-2 . 

During  test  of  this  floating  gate  amplifier,  the  gate  voltage  versus  surface  po- 
tential measurements  on  the  same  device  were  not  reproducible.  Further  test  and  analy- 
sis revealed  that  for  drain  voltages  in  excess  of  10  volts  (relative  to  the  floating 
gate  voltage)  charge  injection  onto  the  floating  gate  structure  took  place.  This 


charge  injection  takes  place  between  the  drain  and  floating  gate.  It  is  postulated 
that  hot  electrons  which  are  available  at  the  drain  are  accelerated  to  the  floating 
gate  structure  because  of  the  electric  field  that  exists  between  the  drain  and  floating 
gate  structure.  Since  the  FET  drain  and  source  diffusion  breakdown  voltage  was  greater 
than  20  volts,  breakdown  was  not  involved  in  the  threshold  shift  mechanism. 


FGE 


VFGE  ' 0 to-35  v (VARIABLE) 

INJECT  IjjA  INTO  SOURCE  WHILE  VD  = -10v 

Figure  2-2.  Floating-Gate  Test  Configuration 

To  verify  that  charging  was  due  to  injection  on  the  floating  gate  structure  and 
not  due  to  trapping  at  the  interfaces,  a test  FET  on  the  same  chip  was  configured  to 
represent  the  floating  gate  FET  (Figure  2-3).  By  applying  the  voltages  to  the  gate 
and  then  lifting  the  gate  probe  off,  a charge  was  left  on  the  gate;  this  charge  should 
exhibit  the  same  characteristics  as  those  of  the  floating  gate  amplifier  FET.  Tests 
indicated  that  when  the  drain  voltage  was  more  negative  with  respect  to  the  gate 
(&V  == 1 0V ) charge  was  injected  on  the  gate,  making  the  gate  more  negative.  These 
results  are  depicted  in  Figures  2-4  and  2-5.  Since  the  gate  of  the  test  FET  is  acces- 
sible, while  that  of  the  floating  gate  is  not,  the  stored  charge  was  completely 
removed  by  connecting  the  gate  to  ground.  This  experiment  was  repeated  many  times  with 
the  same  results. 

Applying  this  theory  to  floating  gate  amplifier  configuration,  charge  deposited 
onto  the  floating  gate  structure  was  controllable  and  predictable.  The  gate  voltage 
versus  surface  potential  curves  were  reproducible  and  in  agreement  with  theory 
(Figure  2-6) . 
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Figure  2-3.  Basic  Test  FET  Configuration 


Figure  2-4.  Iqj  vs  Vqs  with  -10  Vqq  Applied 

to^Gate  of  Floating  Gate  Amplifier 


Figure  2-5.  Iqs  vs  Iqc  Increase  to  Where  "Charge 
Accumulation  is  Observed  on  Gate  of 
FET  of  Floating  Gate  Amplifier 
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In  addition  to  this  problem,  during  test  of  the  three-input  adders  on  the  LSM-1 
mask  set,  charge  voltages  were  observed  on  the  floating  gate  structure.  Tests  indi- 
cated that  during  wafer  processing,  charge  accumulation  was  occurring  to  the  floating 
gate.  To  eliminate  both  problems  a FET  discharge  circuit  was  added  to  allow  discharge 
of  the  floating  gate  structure. 

Finally,  one  other  problem  was  observed  with  the  floating  gate  configuration  as 
indicated  in  Figure  2-2.  The  bias  gate  is  larger  than  the  floating  gate;  this  gener- 
ates potential  variations  under  the  gates  along  the  channel  (Figure  2-7a).  These  bumps 
produce  two  regions  which  are  capable  of  trapping  carriers  and  holding  them,  and  there- 
fore impair  the  proper  operation  of  the  device.  To  correct  this  situation  the  bias 
gate  was  made  smaller  related  to  the  floating  gate  (Figure  2-7b). 

2.2.2  Floating  Gate  Amplifier  Design  on  DP-0  Mask  Set 

The  above  concept  of  adding  a FET  discharge  to  the  floating  gate  structure  was 
incorporated  in  the  design  of  the  floating  gate  amplifier  and  three-input  adder  on  the 
DP-0  mask  set  (see  Figure  2-8).  Tests  performed  on  the  circuits  indicated  that  they 
were  not  susceptible  to  charge  accumulation. 
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Figure  2-7.  Typical  Surface  Potential  Variation  Along  a Channel 
for  Two  Different  Gate  Arrangements 
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Figure  2-8.  Floating  Gate  Configuration  Used  on  DPO 


Determination  of  FET  Charge/Discharge  Characteristics 

Since  the  floating  gate  with  reset  FET  now  possesses  a finite  impedance  to  sub- 
strate (through  the  reset  FET  drain  diffusion),  the  control -gate-to-floating-gate 
transfer  function  is  of  the  form  of  a highpass  network.  As  a result,  ramp  voltages 
applied  to  the  control  gate  will  result  in  dc  floating  gate  voltages.  A diagram  of 
the  test  setup  along  with  the  equation  for  the  floating  gate  voltage,  resulting  from 
a ramp  voltage  applied  to  the  control  gate,  are  provided  in  Figure  2-9. 
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Figure  2-9.  Floating  Gate  Test  Setup  for  Ramp  Inputs 

The  surface  potential  of  the  channel  region  of  the  floating  gate  FET  was  deter- 
mined by  measuring  the  FET  source  voltage  required  for  a 1 yA  drain  to  source  cur- 
rent. To  establish  baseline  data  on  the  floating  gate  FET,  the  reset  FET  attached  to 
the  floating  gate  was  driven  into  conduction  and  a voltage  ramp  applied  directly 
to  the  floating  gate.  The  channel  region  surface  potentials  were  measured  (using  the 
yA  criteria  described  above)  for  several  different  values  of  floating  gate  voltages. 
These  results  are  presented  in  the  first  and  second  columns  of  Table  2-1.  After 
obtaining  this  baseline  data,  voltage  ramps  were  applied  to  the  control  gate  (with 

o 

the  floating  gate  reset  FET  gate  and  drain  grounded)  and  V/sec  rate  (V)  and  duration 
(t)  adjusted  to  obtain  the  same  surface  potentials  as  those  obtained  in  the  baseline 
run  (Table  2-1).  From  knowledge  of  V,  t,  and  Cl  and  C2  (calculated  from  the  device 
geometry),  the  remaining  variable  in  the  floating  gate  voltage  equation  (Figure  2-9) 
is  the  resistance  of  the  reset  FET  drain  diffusion  (R).  If  a value  of  10  ohms  was 
used  for  R,  good  agreement  was  obtained  between  the  calculated  floating  gate  voltage 
and  the  measured  baseline  values.  The  calculated  floating  gate  voltages  are  provided 
in  the  third  column  of  Table  2-1. 


Table  2-1.  Comparison  of  Calculated  vs  Measured  Floating 
Gate  Voltages  for  R = 10^  ohms 


Floating  Gate 
Surface 
Potential 

Us)  (Volts) 

Floating  Gate 
Voltage  by 
Direct  Measurement 

(vFgD)  (Volts) 

-0.6 

-1.8 

-1.0 

-2.3 

-1.5 

-2.8 

-2.0 

-3.5 

-3.0 

-4.8 

-4.0 

-6.0 

-5.0 

-7.0 

Calculated  Floating 
Gate  Voltage  from 

Damn  Tnnnt  R=10»3  n 


Subsequent  discharge  time-constant  tests  and  sinusoidal  phase  shift  tests  were 
performed  which  provided  agreement  with  the  10  ohm  value  for  reset  FET  drain  diffu- 
sion resistance.  These  test  results  have  thus  determined  the  drain  diffusion  resist- 
ance and  yielded  good  agreement  between  device  measurements  and  calculations  based  on 
the  present  model.  Results  to  date  for  pulsed  control  gate  voltages  indicate  that  a 
pulse  amplitude  of  approximately  twice  that  predicted  by  the  model  is  required  to 
threshold  the  floating  gate  FET.  The  cause  of  this  discrepancy  is  not  known. 

Floating  Gate  Amplifier  Design  on  OP-1  Mask  Set 

Further  theoretical  analysis  revealed  that  the  performance  and  physical  configura- 
tion of  the  floating  gate  amplifier  would  be  achieved  by  designing  and  operating  the 
floating  gate  structure  (Figure  2-10). 
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Figure  2-10.  Floating  Gate  Amplifier  with  FET  Discharge 
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Operation  of  this  floating  gate  structure  is  as  follows.  At  time  tQ,  VR  is 
applied  to  the  FET;  its  polarity  turns  the  FET  on,  applying  the  voltage  Vg  to  the 
floating  gate  structure.  At  time  t-j , (t^  = t + At),  the  FET  is  turned  off  by  remov- 
ing voltage  VR.  The  floating  gate  structure  is  now  floating  and  charged  to  a poten- 
tial Vg  and  produces  the  surface  potentials  $s-|  and  $S2  under  the  gates  designated 
master  and  slave  end.  The  master  end  detects  the  presence  or  absence  of  charge  Q 
being  dumped  into  its  well.  The  slave  end  varies  its  surface  potential  $$2  as  a func- 
tion of  the  amount  of  charge  Q being  dumped  into  the  master  well.  This  is  accomplished 
in  the  following  way.  As  charge  is  dumped  in  the  master  side  well,  it  charges  the  sur- 
face potential  to  a new  level,  which  in  turn  changes  the  charge  distribution  on 
the  floating  gate  structure.  This  charge  distribution  in  turn  varies  the  surface  po- 
tential $>s2  under  the  load  gate.  This  variance  in  <pS2  is  then  used  as  part  of  the 
logic  function.  Due  to  the  inherent  inversion  in  this  operation,  charge  is  allowed  to 
flow  in  the  slave  channel  if  no  charge  is  detected  under  the  master  gate,  and  slave 
channel  charge  flow  is  prevented  when  charge  is  detected  under  the  master  gate. 

Theoretical  Analysis  of  Floating  Gate  Structure 

The  magnitude  of  the  variance  in  slave  gate  surface  potential  plays  an  important 
role  in  the  operation  of  the  logic  circuits.  A close  form  expression  which  directly 
relates  the  changes  of  surface  potential  at  the  Master  End  to  changes  of  surface 
potential  at  the  slave  load  end  is  very  desirable.  The  expression  will  be  a function 
of  changes  in  floating  gate  voltages  (&Vg),  any  initial  charge  originally  contained 
in  the  master  well  (fat  zero),  Initial  gate  voltage  applied  at  time  tq  , Vg,  and  the 
amount  of  charge  Q being  dumped  into  the  detector  well. 

The  design  equation  is  obtained  by  writing  Kirckhoff's  voltage  equation  for  the 
gate,  oxide,  and  semiconductor  system: 


H + V 
ps  ox 


(2.1) 


where  Vg  Is  the  applied  gate  voltage,  VRB  is  the  flat  band  voltage,  VQX  is  the  voltage 
drop  across  the  oxide,  and  $ the  surface  potential  under  gate. 

Solving  for  the  explicit  dependence  of  $s  on  the  device  parameters  and  bias  con- 
ditions gives, 


♦s  - v + Vo  - (Vo2  + 2 VVo)1/2 


(2.2) 
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where  e$  is  the  dielectric  constant  of  the  semiconductor,  Cqx  is  the  oxide  capacitance 
per  unit  area,  ex  is  the  permittivity  of  the  oxide,  e is  the  electronic  charge,  and 
the  doping  concentration  is  N^. 


The  change  in  the  detector  end  surface  potential  $Sl  can  be  expressed  as 


1 + 2(V6  - VpB) 


0X2 


(2.3) 


The  other  functional  relationship  between  the  master  and  slave  gates  is  derived  from 
their  interconnection  which  is  depicted  in  Figure  2-11  and  is  expressed  by 
equation  (2.4) 


+ CT  + C 


ox. 


(2.4) 


where  CT  includes  all  other  capacitances  of  the  floating  gate  structure,  including  the 
source  diffusion  capacitance  of  the  FET. 


Figure  2-11.  System  Equivalent  Circuit 
Floating  Gate  Amplifier 


Equation  (2.4)  can  be  put  in  the  form 


(2.5) 
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In  equation  (2.5)  &<p  represents  the  change  in  the  surface  potential  under  slave  gate 
s2 

which  is  always  empty  of  charge  during  its  normal  operation.  Thus  the  slave  gate  can 
be  represented  by  equation  (2.3)  with  qQ  set  to  zero  and  AQ  = 0: 


1 


1 + 2 (VG  - VpB) 

1/2 

Vo 

(2.6) 


Using  equations  (2.6),  (2.5),  (2.3)  and  qQ  = 0 (no  fat  zero),  the  desired  result  of 
finding  the  changes  of  the  surface  potential  under  the  slave  end  as  a function  of  the 
charges  being  dumped  into  the  master  end  is  given  as 


0Xo 
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(2.7) 


2(VG  ' VFB„) 


1 J__ 

T171  ‘ Cox2 
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Where  the  subscripts  2 and  4 refer  to  the  master  ar.d  slave  gates  respectively. 
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2.3  FULL  ADDER  DESIGN 

2.3.1  Original  Full  Adder  Design  on  Mask  Set  LSM-1 

The  block  diagram  of  the  three-input  full  adder  as  originally  designed  on  a mask 
set  designated  as  LSM-1  is  indicated  in  Figure  2-12,  and  Figure  2-13  is  a photograph 
of  a device.  The  operation  of  the  three-input  adder  (Figure  2-12)  is  as  follows. 


I 


CARRY 


Figure  2-12.  Original  Full  Adder  Block  Diagram 


Figure  2-13.  Three-Input  Full  Adder 


I 

Three  digital  inputs  designated  A,  B,  and  G generate  equal  charge  packets  under 
gates  A,  B,  and  G respectively.  The  charge  under  these  gates  is  simultaneously  dumped 
into  the  potential  well  under  the  gate  designated  first  storage  gate.  The  first  storage 
gate,  carry  gate,  intermediate  storage  gate,  and  sum  gate  generate  potential  wells 
which  are  equal  to  each  other,  and  also  equal  to  each  of  the  potential  wells  generated 
under  gates  A,  B,  and  G. 

During  the  time  interval  when  the  shift  registers  A,  B,  and  G are  dumping  charge 
into  the  first  storage  gate,  transfer  gates  1 and  2 are  biased  such  that  if  all  gates 
A,  B,  and  G originally  contained  charge,  then  each  of  the  gates  (first  storage  gate, 
carry  gate,  and  intermediate  storage  gate)  will  have  an  equal  amount  of  charge  under 
them.  The  operation  of  the  three-input  adder  is  based  on  the  fact  that  the  carry  gate 
will  have  charge  under  it  only  if  at  least  two  of  the  three  input  gates  contain  charge. 
The  intermediate  storage  gate  will  have  charge  under  it  only  if  all  the  input  gates 
contain  charge.  The  charge-sensing  device,  called  the  floating  gate  master  end,  de- 
tects the  presence  of  charge  under  the  carry  gate  and  translates  this  information  to 
its  slave  end.  By  means  of  capacitive  coupling,  the  carry  gate  is  used  to  bias  the 
floating  gate.  The  floating  gate  at  the  slave  end  will  allow  charge  to  flow  from  the 
first  storage  gate  to  the  sum  gate  only  if  there  is  no  charge  detected  under  the  carry 
gate;  if  charge  is  detected  under  the  carry  gate,  the  slave  end  prevents  charge  flow. 

An  undesirable  feature  of  this  design  is  that  the  sum  gate  must  be  clocked  to 
prevent  charge  flow  from  the  first  storage  gate  until  enough  time  has  elapsed  to  allow 
any  overflow  to  fill  the  well  under  the  carry  gate. 

This  can  be  corrected  by  inserting  a control  gate  between  the  first  storage  gate 
and  the  floating  gate  slave  end,  which  will  not  allow  charge  to  flow  to  the  sum  gate 
during  the  time  interval  gates  A,  B,  and  G are  dumping  their  charge  packets  into  the 
first  storage  gate.  This  will  allow  time  for  charge,  if  available,  to  be  detected 
under  the  carry  gate  by  the  floating  gate  master  end.  This  new  design  of  the  three- 
input  adder  was  developed  and  tested  on  DP-0,  is  depicted  in  Figure  2-14,  and  discussed 
in  paragraph  2.3.2. 

2.3.2  Test  Results 

Table  2-2  summarizes  the  characteristics  of  lots  1 through  6;  these  contain 
floating  gate  amplifiers  and  three-input  full  adders  which  were  processed  on  the  LSM-1 
mask.  These  circuits  exhibited  unstable  threshold  shifts  and  indicated  excessive 
boron  and  phosphorous  implant. 

Testing  of  the  three-input  adders  of  lots  7,  10,  and  11  of  the  LSM-1  mask  (see 
Table  2-3)  indicated  the  presence  of  charge  sufficient  to  produce  large  positive  volt- 
ages in  the  poly  floating  gate  (up  to  80  volts).  In  most  cases  this  charge  could  be 
eliminated  only  by  applying  a large  negative  pulse  voltage  of  approximately  90  volts 
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Table  2-2.  Summary  of  Lots  1 Through  6 
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Figure  2-14.  Full  Adder  with  Additional  Transfer  Gate 


in  value.  Once  this  charge  was  eliminated  the  gate  voltage  versus  surface  potential 
measurements  were  not  stable.  That  is,  the  measurements  shifted  in  different  direc- 
tions when  taken  on  the  same  device.  This  instability  indicated  that  the  large  volt- 
age discharge  through  the  oxide  was  breaking  down  the  oxide.  Since  the  floating  gate 
had  no  direct  current  path  to  the  substrate,  it  was  not  possible  to  discharge  any 
charge  accumulation.  This  was  accounted  for  in  the  DP-0  design  by  adding  a discharge 
FET  to  the  floating  gate. 

2.3.3  Full  Adder  Design  on  Mask  Set  DP-0 

Modification  to  Existing  Three-Input  Full  Adder 


The  modified  original  full  adder  circuit  block  diagram,  depicting  the  location  of 
the  additional  transfer  gate,  is  given  in  Figure  2-14  and  a photograph  appears  in 
Figure  2-15.  This  gate  decouples  the  surface  potential  under  the  first  storage  gate 
from  that  of  the  floating  gate  and  allows  an  independent  choice  of  the  dc  potential 
on  the  bias  gate  over  the  floating  gate.  This  gate  and  the  addition  of  the  FET  dis- 
charge capability  to  the  floating  gate  structure  led  to  a more  stable  operation  of 
the  three-input  full  adder. 
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Figure  2-15.  Modified  Three-Input  Adder  with 
FED  Discharge 


Two  and  Four-Input  Full  Adders 

During  evaluation  and  implementation  of  the  basic  CCD  three-input  full  adder,  two 
additional  unique  adder  design  concepts  evolved:  the  two-input  adder,  which  behaves 

similarly  to  a conventional  digital  half  adder,  and  a four-input  adder  which  features 
an  automatic  "look  ahead  carry"  capability.  The  basic  configuration  of  the  two  and 
four-input  adder  are  shown  in  Figures  2-16  and  2-17  respectively. 

Two-Input  Adder 

This  full  adder  differs  from  the  original  basic  three-input  full  adder  by  the 
elimination  of  the  intermediate  storage  gate,  transfer  gate,  and  control  gate  between 
the  carry  out  gate  and  intermediate  storage  gate. 


Figure  2-16.  Two-Input  Adder 
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Figure  2-17.  Four-Input  Adder 

The  '.wo-input  adder  performs  in  the  following  way:  the  shift  registers  Ai 

and  Bi  simultaneously  directly  dump  their  charge  packets  into  the  potential  well  under 
gate  D.  The  potential  wells  under  gates  D,  C,  L,  and  N are  all  equal  to  each  other. 

During  the  time  interval  when  the  shift  registers  A^  and  are  dumping  charge 
into  D,  transfer  gate  T$  is  biased  off,  allowing  no  direct  charge  flow  from  D to  L. 

The  control  gate,  during  the  above  dumping  interval,  allows  surplus  charge  to 
flow  under  the  floating  gate  master  side,  gate  C.  The  floating  gate  master  side  de- 
tects the  presence  or  absence  of  charges  under  itself  and  allows  charge  under  gate  D 
to  flow  to  gate  L only  if  there  is  no  charge  under  gate  C.  This  is  achieved  by  con- 
trolling the  surface  potential  under  gate  C,  when  the  transfer  gate,  T$,  is  biased  on. 
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The  outputs  of  gates  L and  N represent  the  least  significant  bit  and  the  next 
significant  bit  respectively  of  the  sum.  The  purpose  of  the  transfer  gate  (TR ) , is 
to  clear  out  any  excess  charge  left  under  gate  D before  the  start  of  the  next  sum 
sequence. 

The  uniqueness  of  the  two-input  CCD  adder  is  in  the  ease  of  design  and  its  abil- 
ity to  add  two  words,  2 bits  at  a time  without  many  of  the  time  delays  inherent  in 
other  CCD  adders. 

Four-Input  Adders 


As  indicated  in  Figure  2-17  the  four-input  full  adder  differs  from  the  original 
three-input  full  adder  configuration,  by  the  addition  of  the  shift  register  Zi  and 
gates  C,  L,  and  N.  Basically,  the  circuit  performs  in  the  following  way.  The  shift 
registers  Ai , Bi , Gi , and  Zi  all  simultaneously  dump  their  charge  packets  into  the 
potential  well  under  gate  D.  As  in  the  basic  configuration,  potential  wells  under 
gates  A,  B,  G,  Z,  D,  M,  K,  and  C are  all  equal  to  each  other.  This,  in  conjunction 
with  the  control  gates  indicated  in  Figure  2-17,  allows  surplus  charge  to  flow  in  a 
sequential  manner  from  under  gates  D,  M,  K,  and  C respectively.  Overflow  charge  can- 
not flow  directly  from  the  potential  well  under  gate  D to  the  potential  well  under 
gate  K,  because  gate  P is  biased  in  the  off  position  during  the  overflow  interval. 

That  is,  during  the  time  interval  when  the  shift  registers  Ai , Bi , Gi , and  Zi  are 
dumping  charge  into  D,  gate  P is  biased  off,  allowing  no  direct  charge  flow. 

The  charge  sensing  element  under  gate  M will  allow  charge  under  gate  to  flow  to 
gate  K during  the  time  interval  immediately  after  the  dumping  interval  only  if  there 
is  no  charge  under  gate  M.  This  also  is  the  case  for  charge  flow  from  gates  K and  M 
to  gates  L and  N respectively.  That  is,  the  charge  sensing  element  under  gate  C will 
allow  charge  under  gates  K and  M to  flow  the  gates  L and  N respectively,  only  if  there 
is  no  charge  under  gate  C. 

The  output  of  gate  C is  the  look  ahead  carry  and  is  represented  by  Ci.  The 
outputs  of  gates  L and  N are  the  least  significant  bit  and  the  next  significant  bit 
respectively  of  the  sum. 

The  purpose  of  the  transfer  gate  (TR),  is  to  clear  out  any  excess  charge  left 
under  gates  D,  K,  and  M before  the  start  of  the  next  sum  sequence. 

Test  Results 


As  indicated  in  paragraph  2.2.1  the  DP-0  mask  differs  from  the  LSM-1  mask  in 
that  a FET  discharge  circuit  was  added  to  allow  discharge  of  the  floating  gate  struc- 
ture. Tests  performed  on  the  floating  gate  amplifier  devices  and  three-input  adders 


I 
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from  the  DP-0  mask  set  concentrated  on  devices  which  had  the  reset  FET  connected  to 
the  floating  gate  structures.  The  tests  were  to  determine: 


• Effective  resistance  across  the  floating  gate  resulting  from  the  reset 
FET  drain  diffusion 

• Conformance  of  these  devices  to  previous  models. 

• Successful  operation  of  the  three-input  full  adder. 

Test  of  the  DP-0-1,  2,3,  and  5 lots  indicated  a very  poor  yield  of  full  adder 
circuits.  The  poor  yield  was  due  to  metal  line  breakage  over  poly  steps.  This  has 
been  corrected  in  the  DP-1  mask.  Table  2-4  summarizes  some  of  the  DP-0  results. 

Test  measurements  revealed  that  the  equations  which  model  the  operation  of  the 
floating  gate  amplifier,  as  designed  on  DP-0,  will  have  to  be  modified  to  reflect 
the  floating  gate  amplifier  discharge  configuration 

Tests  also  indicated  that  the  addition  of  the  FET  discharge  to  the  floating  gate 
structure  ensures  the  floating  gate  performance  to  be  nonsusceptible  to  charge  accumu- 
lation on  the  gate.  This  FET  (as  predicted)  has  contributed  to  the  successful  opera- 
tion of  the  three-input  full  adder  as  evidenced  by  Figures  2-18,  2-19,  and  2-20. 

Once  the  proper  setup  conditions  were  made  to  the  three-input  full  adder,  and 
without  any  further  adjustments,  the  following  characteristics  were  observed: 

• When  one  input  is  applied  to  the  adder  a sum  signal  output  is  observed 
(Figure  2-18) 

• When  two  inputs  are  applied  to  the  adder  a zero  signal  and  a one  signal 
are  observed  in  the  sum  and  carry  outputs  respectively  (Figure  2-19) 

• When  three  inputs  are  applied  to  the  adder,  one  signal  is  observed  in 
both  the  sum  and  carry  outputs  (Figure  2-20). 

The  above  observed  results  are  precisely  the  outputs  expected  when  the  above  respec- 
tive signals  are  applied  to  the  full  adder  input. 

As  theoretically  predicted,  the  sum  output  for  a single  one  input  does  not 
exhibit  the  output  magnitude  of  the  other  two  cases.  This  variance  in  magnitude  is 
due  to  the  geometrical  design  and  layout  of  the  floating  gate  structure  incorporating 
the  FET  discharge  capability  on  DP-0;  the  DP-1  design  will  produce  equal  amplitude 
output  voltages  for  all  cases. 


Table  2-4.  Summary  of  Lots  DPO-1,  -2,  -3,  -4,  and  -5 


I Major  Processing  Characteristics 


Oevice/Circult  Characteristics 


Poly/OHoe  "etal/0*»de  field  Test  fUs  (Threshold  Voltages) 

Lot  ho  thickness  Thickness  Oalde r [ r floating 


I Met  Dry  Met  Dry  Met  Dry] 

IDOoX  2000*  * 


Pol,/  I Het.l/  I Poly/  net.l/l 

D4te  Gate  Field  Field  ""pMM#rl 


-35.0  -27.0  Vs  and  Vr 

Curve  i an't 
repeat 


light  sensitive  test 
Mdfer  40 
. 

Good  1 4 
Bad  1? 

Carr,  channel 
Good : 20 

Bad  6 


Su»«  channel 
Good  1 6 


Carry  cnannel 
Good  24 
Bad  2 


1000*  2000*  < 


-1.5  -2.0  -16.5 


Mater  42 

Good  6 Bad  20 

lull  adder  Output  circuit 
tested  ty  light  very  qood 
but  Operant  as  shift 
register  resulted  in  low 
yield 

Some  good  units  functional 
as  full  adder 


Wafer  4? 

Sum  < hannel 
Good  23 
Bad  3 

irr> 

Good  23 
Bad  3 

Bad 

light  sensitive 

test 

wafer  53 

Suif'  channel 

Good 

16 

Bad 

6 

Carry  channel 

Good 

21 

Bad 

1 

Wafer  54 

Sum  channel 

Good 

9 

Bad: 

13 

Wafer  55 

Sum  channel 

Good 

4 

Bad 

18 

Carry  channel 

Good 

19 

Bad: 

3 

Shift  register  test 

Wafer  53 

Good 

5 

Bad 

22 

Wafer  64 

Good- 

8 

Bad: 

18 

Wafer  55 

Good 

7 

Bad 

19 

light  sensitive 

test 

Wafer  57 

Sum  channel 

Good 

< 

Bad 

18 

I arry  channel 

Good 

20 

Bad 

2 

Wafer  59 

Sum  channel 

Good 

3 

Bad 

19 

Carry  channel 

Good 

19 

Bad 

3 

Shift  reqister 

test 

Wafer  57 

Good 

3 

Bad 

24 

A1 l F[T  floating  gate 
and  full  adder  were  in 
depletion  mode 


full  adder  functional  with 
•10  v substrate  Bias  out- 
put f ' T circuit  perform- 
ance not  affected  by 
me >eased  frequency  from 
28  kM/  to  71  kM*  Operate 
as  f ul 1 adder 
Input  • 111.  sum  output 
decreased  as  frequency 
increased 

Light  test  result  indicate 
sum  channel  output  low 
Shift  reoister  test  result 
Indicates  low  yield  on 
good  unit 


Light  test  on  both  wafers 
Each  had  3 qood  units  and 
all  3 good  units  on  same 
location  at  each  wafer 
Shift  register  test 
resulted  In  low  yield 


a)  NO  INPUT  APPLIED 


b)  SINGLE  "ONE"  INPUT  APPLIED 
Figure  2-18.  Output  to  Three-Input  Adder 
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Figure  2-19.  Sum  and  Carry  Output  When  Two  "Ones" 
Are  Applied  tn  Three-Input  Adder 


Figure  2-20.  Output  with  Three  "Ones"  Applied  to 
Three-Input  Adder 


I 

I 
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2.3.4  Three-Input  Full -Adder  Designs  on  Mask  Set  DP-1 

The  logic  design  of  multiplier  or  adder  arrays  can  be  mechanized  from  half-adders, 
full-adders  or  a mixture  of  both.  Design  tradeoffs  are  associated  with  all  three 
approaches;  the  half-adder  approach  requires  more  CCD  gates  than  if  only  full -adders 
are  used  but  has  a simpler  clock-line  layout.  A mixture  of  half-adders  and  full- 
adders  requires  the  least  number  of  CCD  gates  but  it  also  requires  the  design  of  two 
standard  logic  functions. 

In  this  section  we  describe  the  design  of  a full-adder  logic  function,  its  modifi- 
cation to  half-adder  and  the  design  of  a two-input  AND  gate.  The  designs  of  a two- 
word  adder  array  and  a multiplier  array  are  also  described. 

A full-adder  logic  cell  has  three  inputs;  a,  b,  and  g,  of  which  any  one  may  have 
a binary  value  of  0 or  1 . The  logic  cell  adds  the  three  bits  together  and  generates  a 
sum  bit  and  a carry  bit,  each  having  a binary  value  of  0 or  1 . A truth  table  for  the 
full-adder  logic  cell  is  given  in  Table  2-5. 


Table  2-5.  Truth  Tables  for  Full-Adder,  Half-Adder  and 
AND  Gate  Logic  Functions 


The  operation  of  the  full -adder  cell  can  best  be  described  by  referring  to  the 
schematic  shown  in  Figure  2-21.  In  the  design  the  charge  storage  buckets  are  the  CCD 
channels  under  the  polysilicon  and  are  labeled  in  Figure  2-21  as  A,  B,  C,  etc.  If 
any  of  the  three  input  signals  a,  b,  g are  at  the  binary  1 level  they  allow  the  cor- 
responding input  buckets  A,  B,  G to  fill.  When  the  01  clock  line  goes  to  its  negative 
value,  the  charges  in  A,  B,  G transfer  to  the  D storage  area.  Since  the  D storage 
area  is  identical  in  size  to  A,  B,  ‘or  G it  will  fill  completely  if  any  one  of  the 
three  input  buckets  is  full.  However,  if  two  input  buckets  are  full,  it  will  overflow 
and  also  completely  fill  storage  area  C under  the  floating  gate.  The  charge  in  the  C 
storage  area  will  cause  a voltage  change  to  occur  on  the  master  end  of  the  floating 
gate.  This  voltage  change  is  immediately  transferred  to  the  slave  end,  which  in  turn 
causes  a change  in  the  surface  potential  of  the  CCD  channel  beneath  it,  inhibiting  the 
transfer  of  charges  along  the  channel. 


Figure  2-21.  Schematic  Diagram  of  Full  Adder  Test  Cell 

When  all  three  input  buckets  A,  B,  G are  full,  then  both  the  D and  C storage  areas 
spill  over  and  the  I storage  area  also  fills  completely. 

For  the  first  condition  described,  when  only  one  input  bucket  is  full,  the  charge 
stored  in  D is  transferred  into  the  H storage  area,  then  under  the  slave  end  of  the 
floating  gate  into  M,  and  subsequently  into  S at  the  next  01  negative  transition  and 
out  as  a binary  1 sum  bit. 

When  two  input  buckets  fill  both  the  D and  C storage  areas,  the  charge  in  D again 
transfers  into  H,  but  now  it  is  inhibited  from  passing  under  the  floating  gate.  It  is 
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removed  from  the  H storage  cell  before  the  next  input  bit  appears  by  05  going  negative 
and  transferring  the  charge  to  the  discharge  diode;  and  the  SUM  bit  is  now  an  0.  Also 
at  the  05  negative  edge,  the  charge  stored  in  the  C storage  area  is  transferred  out 
and  becomes  a binary  1 carry  bit.  For  the  final  condition  when  all  three  input  buckets 
and  also  the  D,  C,  and  I storage  buckets  are  full,  the  identical  transfer  conditions 
exist  as  when  two  input  buckets  are  full;  except  that  the  charge  in  the  I storage  area 
transfers  out  at  the  04  negative  transition.  Thus  both  the  sum  and  carry  outputs  have 
a binary  1 output. 

The  sequences  of  charge  storage  and  transfer  can  be  derived  from  the  waveforms  of 
the  full -adder  clock  phases  shown  in  Figure  2-22. 


i i i i i i i i i i i i i 
l0  t2  *4  46  *0  *2  l4 

Figure  2-22.  Clock  Line  Phase  Sequence  for  Full-Adder 
and  Half-Adder  Functions 

Several  clock  lines  are  required  for  the  full -adder  cell,  and  in  an  attempt  to 
keep  the  clock  lines  to  a minimum  number,  it  was  decided  to  make  the  storage  areas-  of 
A,  B,  G,  D,  M,  I,  S,  K,  N identical.  For  the  layout  reasons  explained  in  Section  4.3 
the  standard  area  of  these  standard  gates  is  6.9  mil  . In  each  of  the  logic  cells 
described  the  aluminum  gate  is  used  for  transfer  and  the  polysilicon  gate  is  used  for 
storage.  In  general,  an  aluminum  gate  is  connected  to  a polysilicon  gate.  The 
maximum  amount  of  charge  that  can  be  stored  in  the  D,  M,  I,  etc.,  storage  areas  can 
be  represented  by  Q$; 
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(2.8) 


Qs  - as[ks(/^  - 4T)  + cox  (*p  - ♦.)] 

where  K$  is  a constant  that  involves  the  semiconductor  processing  parameters,  is 
the  surface  area  of  the  gate  CQx  is  the  oxide  capacitance  per  unit  area,  Ap  is  the 
value  of  surface  potential  under  the  polysilicon  gate,  and  <j>  is  the  value  of  surface 

a 

potential  under  the  metal  gate.  The  surface  potential  is  measured  at  a specific  gate 
voltage  and  obtained  from  the  curves  plotted  from  a wafer  that  was  processed  to  an 
identical  schedule  gate  voltage-surface  potential  used  for  the  current  design. 

During  the  design  of  DP-1,  only  as  the  first  term  of  equation  (2.8)  was  retained  as 
an  approximation  to  the  actual  Q . Thus  for  the  DP-1  process,  at  a 01  gate  voltage  of 
-8.7  volts  <J>  = 3.5  and  <f>a  = 1.25 

Qs  = Ks  0.69  (/3T5  - /TT25  ) 


= Ks  0.518 


In  order  to  obtain  as  large  voltage  change  under  the  master  end  of  the  floating  gate, 

the  C storage  area  was  made  as  small  as  possible.  The  minimum  area  was  controlled  by 

the  spacing  of  three  metal  gates  that  overlap  the  C polysilicon  area,  and  resulting 

2 

in  an  area  of  A = 0.3  mil  . 
c 

When  two  input  signal  lines  are  at  binary  1,  both  the  D and  C storage  areas  must 
fill  completely,  yet  not  spill  over  to  the  I area.  Therefore  the  full  level  of  the  C 
bucket  must  be  identical  to  that  of  the  D area.  If  the  approximate  well  size  expres- 
sion is  rearranged,  the  empty  well  potential,  <Pe,  under  the  floating  gate  can  be 
derived 


/0. 518  Ks 

■vw 


/1T25 


(2.9) 


= 8.09 

However,  positive  feedback  is  induced  to  the  C storage  area  from  the  floating  gate.  As 
the  C storage  area  fills  with  charge,  the  induced  voltage  change  on  the  floating  gate 
results  in  an  apparent  reduction  of  the  empty  C bucket  size.  Since  a full  bucket  depth 
°f  <f>e  - <Pa  is  required  it  is  necessary  to  precharge  the  floating  gate  to  an  initial 
voltage  that  compensates  for  the  feedback  reduction.  The  surface  value  of  the 
required  empty  bucket  under  the  floating  gate  can  be  derived  from 
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♦l  = (*e  ' *a}  “TZl+  *e 
P 

where 

CQx  i = oxide  capacitance  over  the  C storage  area 

Cp  = total  parasitic  capacitance  of  the  floating  gate 

For  the  subject  gate  design  CQx  -j  = 0.034  pf  and  Cp  = 0.058  pf 

A.  = (8.09  - 1.25)  0.592  + 8.09 


(2.10) 


= 12.1  volts 

From  the  gate  voltage-surface  potential  curves  for  polysilicon,  a gate  voltage  of 
VG  = 17.8  volts  is  required  to  produce  a surface  potential  Vs  = 12.1  volts. 

At  the  appropriate  time  the  charge  in  the  C storage  area  is  transferred  into 
the  N storage  area  by  the  negative  transition  of  05.  In  order  to  transfer  all  of  the 
charge  in  the  C storage  area,  it  is  necessary  to  make  the  05  negative  level  be  -24  V 
which  results  in  a 0.9  volt  step  from  the  ^ level  of  12.1  down  to  13  volts. 

The  full-adder  and  the  half-adder  cells  are  designed  to  have  a hybrid  floating- 
gate;  the  master  end  is  polysilicon  and  the  slave  end  is  aluminum.  The  17.8  precharge 
voltage  applied  to  the  floating  gate  results  in  an  initial  V$  under  the  (aluminum) 
slave  end  of  4>s-|  = -9.4  volts. 

When  the  C storage  bucket  is  filled,  it  induces  a voltage  change  in  the  floating 
gate  resulting  in  a new  value  at  the  slave  end 


= 17.8  - (12.1  - 1.25)  (0.37) 


= 13.8  volts 


The  13.8  volts  on  the  metal  end  of  the  floating  gate  produces  a new  *s2  = -6.5  volts. 
The  transfer  gate  under  the  slave  end  of  the  floating  gate  then  switches  from  -9.4  to 
-6.5  volts. 

To  ensure  that  the  charge  stored  under  the  H bucket  is  completely  transferred 
when  the  floating  gate  induces  the  <j>s^  level,  its  empty  level  <f>^e  must  be  approximately 
one-half  volt  less  negative  than  ^ 
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>he  = ♦s1  + °’5 
= 9.4  + 0.5 


= -8.9  volts 


However,  to  ensure  that  the  <fi  level  under  the  floating  gate  also  completely  inhibits 

s2 

transfer  of  the  H bucket  charge,  the  full  bucket  level,  4^,  in  the  storage  area  must 

not  exceed  <j>  - 0.7  volt. 

s2 

4*h  = <f>  - 0.7 

nf  s2 


= -6.5  - 0.7 


= -7.2  volts 


The  area  of  the  H storage  area  can  be  derived  by  rearranging  the  approximate  well 
bucket  size  expression 


Kc  (4T  - */<0 


K 0.518 

s 

Ks  (/b79  - /rtf 

= 1.8  mil2 

During  the  enable  mode  the  charge  stored  in  the  H area  is  transferred  under  the 
slave  end  of  the  floating  gate  into  the  M storage  area.  To  ensure  that  all  of  the 
charge  is  transferred  it  is  necessary  that  the  full  well  level  4>m,  of  the  M storage 
area  be  approximately  one  volt  more  negative  than  <j> 


j>  =6  -U  - d>  ) -1.0 

m s1  VP  a 
= -9.4  - (3.5  - 1.25)  - 1.0 


= -12.65 


2-29 


The  voltage  applied  to  the  4>^  polysilicon  gate  necessary  to  produce  4>m  = -12.65 
is  -18.5  volts.  The  design  of  the  half-adder  logic  cell  is  very  similar  to  the 
design  of  the  full-adder,  and  a schematic  diagram  of  the  half-adder  is  shown  in 
Figure  2-23.  Since  there  are  only  two  inputs  (a,  b)  to  a half-adder  the  need  for  the 


I and  K storage  cells  disappears,  so  the  layout  is  simpler. 


/77 

Figure  2-23.  Schematic  Diagram  of  the  Half-Adder  Test  Cell 


A truth  table  for  the  half-adder  logic  cell  is  given  in  Table  2-5;  the  clock  se- 
quences shown  in  Figure  2-22  are  also  applicable  to  the  half-adder,  except  that  04  is 
not  required. 

To  perform  binary  multiplication  it  is  necessary  to  use  a two-input  AND  gate. 

This  function  was  mechanized  by  removing  the  floating  gate  and  the  H,  M,  and  S storage 
areas  from  the  half-adder  design.  It  was  also  necessary  to  move  the  discharge  gate 
and  diode  from  the  H storage  area  to  the  D storage  area,  as  shown  in  Figure  2-24.  A 
truth  table  for  the  AND  gate  is  given  in  Table  2-5.  The  AND  gate  operates  from  only 
two  clock  phases. 

In  order  to  obtain  design,  layout,  and  testing  experience  with  CCD  signal  pro- 
cessing devices,  a two  word  4-bit  binary  adder  was  included  on  the  DP-1  chip.  The 
addition  of  the  two  binary  words  a^  - a^,  b^  - b4  is  performed  in  a straightforward 
manner: 

Carry  bit  C 2 

First  word  a^  a^  a2  a^ 

Second  Word  b4  b^  b2  b^ 

Sum  Cg  s4  s^  s2  s.| 
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Figure  2-24.  Schematic  Diagram  of  Two-Input  AND  Gate 


The  least  significant  column  has  only  two  input  bits  so  that  a half-adder  is 
satisfactory;  however,  the  other  three  columns  have  three  inputs  and  full-adders 
are  required. 

A block  diagram  of  the  two  word,  4-bit  adder  is  shown  in  Figure  2-25.  It  will 
be  seen  from  the  diagram  that  delay  stages  have  been  added  to  the  input  signal  paths 
of  the  most  significant  bits;  this  is  to  ensure  that  the  input  data  arrive  at  the 
full -adder  output  synchronously  with  the  carry  bit.  In  order  to  compensate  for  the 
input  delays  applied  to  the  most  significant  bits,  it  was  also  necessary  to  include 
corresponding  delays  to  the  sum  output  lines  of  the  least  significant  output  bits. 

A two  word  3-bit  multiplier  array  was  also  included  on  the  DP-1  chip.  The  multi- 
plier is  a little  more  complex  than  the  adder  and  involves  the  use  of  an  AND  function 
besides  half-adders  and  full-adders.  The  multiplication  of  the  two  3-bit  numbers 
a^  - a^,  b-j  - b^  is  performed  in  the  usual  manner 
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The  least  significant  column  consists  of  an  AND  gate,  the  second  least  significant 
column  requires  two  AND  gates  and  a half-adder.  The  third  column  requires  three  AND 
gates;  and  a carry  bit  may  also  be  received  from  the  second  least  significant  column 
and  must  be  added  to  the  three  output  bits  from  the  AND  gates.  Thus,  overall,  the 
mechanization  of  that  column  used  a full-added  and  a half-adder  as  shown  in  the  block 
diagram  of  the  3x3  multiplier  (Figure  2-26).  In  the  fourth  column  from  the  right,  two 
AND  gates  are  required;  however,  since  a carry  bit  may  be  received  from  both  the  half- 
adder and  full-adder  of  the  lower  column,  again  four  bits  must  be  added  together.  An 
identical  combination  of  full-adder  and  half-adder  were  used  to  add  the  four  bits. 

The  second  most  significant  column  requires  only  an  AND  gate  and  full -adder,  the  carry 
output  from  the  full -adder  providing  the  most  significant  bit  in  the  product. 
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In  order  that  the  product  bits  were  obtained  in  phase,  it  was  necessary  to  add 
delay  stages  to  the  most  significant  input  and  least  significant  output  columns. 
These  delays  are  shown  in  Figure  2-26. 


3.  COMPUTATIONAL  ALGORITHM  TRADEOFFS 

Use  of  the  two,  three,  and  four-input  adders  to  develop  and  design  digital  systems, 
such  as  N-bit  multipliers  and  adders,  revealed  interesting  system  configurations  and 
design  requirements. 

A few  conceptual  systems  which  utilize  the  two  and  three-input  adder  and  other 
CCD  circuit  designs  are  depicted  in  Figures  3-1  through  3-4.  The  block  diagrams  indi- 
cated in  Figures  3-2  represent  the  adder  and  multiplier  designs  of  the  DP-1  mask  set. 

The  two-word  4-bit  adder  using  two  input  adders  and  OR  gates  as  the  basic  build- 
ing blocks  can  be  implemented  in  many  ways,  two  of  which  are  depicted  in  Figures  3-1 
and  3-2. 

Both  designs  accept  the  incoming  information  directly  into  the  two-input  adder. 

The  adder  (Figure  3-2)  operates  on  the  concept  that  the  sum  and  carry  outputs  of  the 
adjacent  1/2  adder  are  the  inputs  to  succeeding  1/2  adders.  Since  the  data  generated 
by  adjacent  adders  are  generated  in  a parallel  fashion,  no  delays  are  inherent  in  the 
adding  operation  of  the  array.  The  appropriate  sum  outputs  are  delayed  to  keep  the 
information  moving  synchronously  through  the  adder. 

Since  the  output  of  the  two-input  adder  is  either  a (01)  or  (10)  representing  the 
inputs  of  a (1,0)  or  a (1,1)  respectively,  OR  gates  are  used  to  truncate  the  array 
design  in  lieu  of  input  adders.  Additional  features  of  the  OR  functions  are  design 
simplicity,  operation,  and  minimal  real  estate  requirements  as  compared  to  the  two- 
input  adder. 

The  adder  array  indicated  in  Figure  3-1  utilizes  the  OR  function  within  the  array 
between  successive  two-input  adders.  This  allows  simplicity  of  design  and  utilizes 
less  two-input  adders,  as  compared  to  the  two-input  adder  array  discussed  above.  As 
an  example,  the  two-input  adder  array  (Figure  3-2)  requires  10  two-input  adders  and 
three  OR  gates  to  perform  the  same  arithmetic  function  as  the  two-input  adder  (Fig- 
ure 3-1)  which  requires  only  seven  two-input  adders  and  three  OR  gates.  Twenty- 
four  delay  gates  are  required  by  the  latter  design  as  compared  to  11  delay  gates 
required  by  the  former  design.  Even  with  the  additional  13  delay  gates  the  two-input 
adder  array  will  require  less  real  estate  than  the  design  of  Figure  3-2,  since  delay 
gate  real  estate  requirements  are  very  small. 

The  two  additional  delays  required  in  the  design  (Figure  3-1)  compare  to  the 
adder  (Figure  3-2)  needed  to  maintain  synchronization  of  all  the  data  lots  will  degrade 
the  signal  level  due  to  transfer  inefficiency.  Optimum  design,  therefore,  is  to  mini- 
mize the  amount  of  delayed  transfers  required  to  perform  arithmetic  functions. 
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Figure  3-1.  Four-Bit  Full  Adder  Using  Half  Adders 
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Figure  3-2.  Two-Word  4-Bit  Adder  Using  Two-Input  Adder  Circuits 
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Figure  3-3.  4X4  Multiplier  Using  Half  (2,2)  Adders 


Figure  3-4.  Two-Word  3-Bit  Multiplier  Using 
Two  Input  Adders  and  OR  Circuits 
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The  multiplier  (Figure  3-4)  basically  is  the  two-input  adder,  with  AND  gates  in 
the  front  end  which  generate  the  required  a.^.  products.  This  approach  was  used 
instead  of  generating  the  required  a..^  products  at  the  various  times  and  places  they 
are  needed  in  the  design  depicted  in  Figure  3-3.  The  approach  of  product  a^b^  genera- 
tion in  the  front  end  eliminates  channel  crossings  and  simplifies  the  design. 


4.  MASK  SET  DP-1 


4.1  OVERVIEW 

The  mask  set  DP-1  design  was  based  in  part  on  the  experience  gained  through  the 
use  of  mask  set  DP-0.  As  the  concepts  for  arithmetic  computation  continued  to  evolve, 
two  basic  schemes  emerged.  Previous  chapters  discussed  some  of  the  important  trade- 
offs involved  in  comparing  these  approaches.  One  purpose  of  DP-1  was  to  permit  addi- 
tional experimental  investigation  of  these  techniques.  Accordingly,  DP-1  contained 
two  realizations  of  each  circuit  function;  one  realization  employed  three  input  adders, 
the  other  two  input  adders.  The  list  of  circuits  and  test  devices  on  DP-1  is  con- 
tained in  Table  4-1.  Chips  produced  from  this  mask  set  provide  complete  test  vehicles 
for  comparing  and  contrasting  the  two  diferent  approaches  and  for  gaining  experience 
with  arithmetic  function  design,  production,  and  testing.  Figure  4-1  shows  an  overall 
view  of  a DP-1  chip. 


Table  4-1.  Circuits  and  Test  Devices  on  DP-1 
Circuits 

Two-word  3-bit  multiplier 
Two-word  4-bit  adder 

Test  Devices 
AND  gates 
Two-input  adder 
Three-input  adder 
Threshold  test  FETS 


4.2  DETAIL  DESCRIPTION  OF  TWO-INPUT  FULL  ADDER 

The  general  operation  of  the  two-input  adder  is  discussed  in  Section  2.3.3  of 
this  report.  A photograph  of  the  circuit  is  depicted  in  Figure  4-2.  Figure  4-3 
depicts  operation  of  the  two-input  adder  as  a function  of  the  surface  potentials, 
gate  voltages,  and  timing  waveforms.  The  equations  used  in  the  two-input  design, 
in  determining  the  gate  voltage,  charge,  and  respective  surface  potentials  are 
presented  in  Section  4.2.2. 


4-1 


Figure  4-2.  Two-Input  Adder  Circuit 


m m -V”4  Hi 

H 

m 

-t's  *.  wf  -•  W- 

Q 


INPUT 


U 


m 


/ SUM  STORE 


-19.5 

DRAIN  0 


rr- 


GRD 


, CARRY  STORE 


jL 


0 

-9 


®1  (BEFORE  STEP) 


-18 

-17 


i r 


Figure  4-3.  Detail  Schematic  Indicating  Surface  Potentials,  Gate 
Voltages,  and  Timing  Waveforms  of  Two-Input  Adder 
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The  operation  of  the  two-input  adder  is  as  follows.  The  amount  of  charge  Q to  be 
used  in  the  operation  of  the  two-input  adder  is  determined  by  input  gates  and  C 
Gate  C-j  is  actually  the  input  signal  gate,  but  its  relative  surface  potential  with 
respect  to  gate  determines  the  storage  well  size.  The  input  diodes  are  pulsed  on, 
their  surface  potential  then  being  just  above  that  of  C^.  Gate  is  used  as  a trans- 

fer gate  and  is  internally  connected  to  gate  which  is  the  first  storage  gate.  The 
first  storage  gate  is  the  same  size  as  gate  C2  and  all  other  storage  gates,  designated 
by  the  asterisk  *.  All  storage  gates  are  poly  gates  and  all  transfer  gates  are  metal 
gates. 

As  indicated  in  Figures  4-2  and  4-3,  the  two-input  adder  consists  of  two  channels: 
"sum"  and  "carry."  Their  only  coirmon  gates  are  and  . If  the  incoming  signal, 
when  applied  to  gates  Cla  ^ simultaneously,  is  a (1,0)  or  (0,1),  this  signifies  that 
only  one  charge  packet  is  generated  and  will  be  processed  in  the  sum  channel  resulting 
in  a digital  sum  of  (1,0)  at  the  output.  If  the  incoming  signal  applied  to  gates 
C]a  (3  is  a (1,1),  this  signifies  that  two  charge  packets  are  generated  simultaneously 
and  are  processed  in  the  carry  channel,  resulting  in  a digital  sum  of  (01)  at  the  out- 
put (Figure  4-4).  As  seen  by  the  timing  diagrams,  when  input  signal  (1,0)  or  (0,1) 
occurs  (Cy)  in  the  carry  channel  is  biased  such  that  no  charge  can  flow  under  the  gate 
designated  V$  in  the  carry  channel.  This  gate  is  the  master  end  gate  of  the  floating 
gate  amplifier.  Since  there  is  no  charge  under  the  master  gate  of  the  floating  gate 
amplifier,  the  slave  end  is  biased  such  that  when  gate  Sy  and  S are  turned  on,  the 
charge  contained  under  will  flow  from  and  be  stored  under  the  poly  gate  of  S$ 
until  Sy  is  turned  on. 

If,  however,  the  input  signal  is  a (1,1),  this  implies  that  two  equal  charge 
packets  have  been  generated  and  are  simultaneously  processed  under  gate  . Since 
gate  S^  can  only  hold  one  Q quantity  of  charge,  Cy  is  biased  such  that  the  other 
charge  is  dumpled  under  the  master  end  of  the  floating  gate,  since  the  gate  Sy  is  bi- 
ased off  during  this  time  interval.  Since  charge  is  now  deposited  under  the  master 
end  of  the  floating  gate,  the  slave  end  of  the  floating  gate  (sum  channel)  surface 
potential  is  such  that  when  Sy  is  turned  on,  the  other  charge  Q still  residing  under 
S^  cannot  flow  past  the  slave  end  of  the  floating  gate  and  into  gate  Sg  when  S$  is 
turned  on. 

After  this  time  interval  is  over,  gate  is  turned  on  and  the  charge  Q contained 
in  the  master  end  is  then  processed  into  and  detected  as  an  output,  while  the 

charge  contained  under  S-j  and  Sy,  which  was  blocked  by  the  slave  load  end  of  the 
floating  gate  is  drained  out  by  turning  on  gate  GRD. 
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a)  Output  When  Input  is  (1,1) 


b)  Output  When  Input  is  (1,0)  or  (0,1) 


Figure  4-4.  Two-Input  Adder 
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4.2.1  Detail  Analysis  of  Two-Input  Adder 
Determination  of  Constants 

In  the  equations  depicted  in  Section  2.4.4,  the  following  constant  are  assumed 
and  used  throughout:  Na  = 10^5/cm3,  T = 300°K,  ni  = 1 .4  x 10^/cm3  <10Q>  material, 

and  Qss  = 1.4  x 10~8/cm2  (CQul). 

For  metal  - SiC^  - Si  (n-type)  structure,  flat  band  voltage 

V = d.  — 

VFB  % ‘ C 

ox 

becomes 

VpB  = -0.3  V - 1-4x10  8/cm3  (4.2) 

Lox 


For  poly  - Si 02  - Si  (n-type)  structures,  flat  band  voltage 
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e es  NA 
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1.7  x 10"16 


(4.2) 
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(4.3) 


Determination  of  Amount  of  Charge  Q 

The  amount  of  charge  Q can  be  calculated  by  using  equation  (2.3)  of  Section 
2.4.4  which,  when  qQ  = 0 (no  fat  zero),  becomes 


4<CS 


2,  V, 


1 + 


(VG 


'FB, 


11/2, 


(4.4) 


From  the  VG/<t>s  curves  (Figure  4-5)  with  a gate  voltage  of  VQ  = -7  volts  being  applied 
to  C2,  and  making  the  surface  potential  under  gate  C1  equal  to  -2.5  volts  by  supplying 
-7  volts  to  its  gate,  the  amount  of  charge  becomes 


Q = 0.264  X 1012  Coul 


4-6 


_ -35 

Figure  4-5.  Vq/4>s  Curves  for  DPI -2  Wafer  No.  5 (+10  V Bias) 
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This  assumes  for  poly  gate  an  oxide  thickness  of  1000  A and  an  area  of  0.78  (mil)  . 
Determination  of  Surface  Potential  Under  Gate  S-j 

When  the  above  charge  Q is  being  dumped  under  gate  SI,  which  is  being  subjected 
to  a gate  voltage  of  -9  volts,  and  a surface  potential  of  -6  volts  is  determined  from 
the  Vg/<t>s  curve,  S^  surface  potential  will  change  to  a new  value  determined  by 

1/2 

which  becomes  after  substitution 


<t»  = -4.49  volts 


Determination  of  Surface  Potential  Under  Floating  Gate  Master  End 


As  the  charge  Q is  being  dumped  under  the  floating  gate,  after  it  has  been  sub- 
jected to  a gate  voltage  of  -17  volts  which  produces  surface  potential  of  -13.5  volts 
(as  determined  from  the  Vq/^  curves),  its  new  surface  potential  is  calculated  by  the 
following  equation: 


4-7 


( ' ,) 

IpEZZaS]172  j 

!vg  - H)r  /,  x) 


. *9_ 

^0X„ 


Since  the  C area  of  the  two-input  adder  is  0.15  (mil ) ^ and  an  oxide  thickness  of 

O OX 

1000  A,  the  surface  potential  becomes  for  equation  (4.6) 

4>s  = -5.36  volts 

Determination  of  Surface  Potential  Under  Floating  Gate  Slave  End 

As  charge  is  being  dumped  under  the  master  end  of  the  floating  gate,  the  slave 
end  surface  potential  will  vary,  in  response  to  the  amount  of  charge  being  dumped 
under  the  master  end,  and  can  be  calculated  by  using  equation  (2.7)  of  Section  2.4.4. 
Because  C = C in  the  design  of  the  two-input  adder,  this  equation  can  be  reduced 

Un/  v/Afl 

to- 


(0 


(VG  ' VFbP1  1/2 


-A<t>s  - 


2<W  1/2  ox« 


and  becomes 


<(>s  = -11 .84  volts 


Determination  of  Amount  of  Charge  Contained  Under  S^  and  S-j. 

To  ensure  proper  operation  of  the  two-input  adder,  the  amount  of  charge  contained 
under  gates  C-j  and  Cj  simultaneously  must  be  less  than  the  amount  of  charge  generated 
by  C2,  providing  that  the  surface  potentials  are  contrained  to  the  surface  potentials 
of  those  determined  by  the  change  in  the  floating  gate  slave  end.  This  calculation 
has  to  be  done  in  two  steps  because  S,  gate  is  a poly  gate  with  oxide  thickness  of 

o I o 

1000  A,  and  Sj  a metal  gate  with  oxide  thickness  of  2000  A.  Taking  these  oxide 
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thicknesses  and  the  surface  potential  into  consideration;  as  depicted  in  Figure  4-3 
under  the  floating  gate,  equation  (4.8)  below  will  determine  the  amount  of  charge  Q, 


(4.8) 


This  calculation  yields  a charge  of 

Q = 0.3668  x 10'12  (Coul) 


which  is  greater  than  the  charge  generated  by  gate  C£  which  was  determined  to  be  equal 
to  0.264  x 10'12  Coul. 

In  solving  the  above  equations  it  is  assumed  that  is  a constant  of  integration. 
This  constant  introduces  approximately  3 percent  error  in  the  calculation,  which  is 
well  within  experimental  error  in  determining  the  voltages  from  the  Vq/<j>s  experimental 
curves. 

4.3  DETAIL  DESCRIPTION  OF  THREE-INPUT  ADDER  CIRCUITS 

The  designs  of  the  half-adder  and  full-adder  logic  cells  described  in  Section 
2.3.4  are  very  different  from  the  full-adder  incorporated  on  the  DP-0  chip.  It 
therefore  seemed  desirable  to  lay  out  a full-adder,  a half-adder,  and  a two-input  AND 
gate  on  the  DP-1  chip.  To  assist  in  the  analysis  of  test  results  it  was  decided  to 
also  place  a floating  gate  amplifier  on  the  chip.  The  full-adder  and  half-adder  have 
separate  test  areas,  but  the  floating  gate  amplifier  and  two-input  AND  gate  are  com- 
bined in  a single  test  area.  All  three  test  areas  have  the  same  22  bonding  pad  layout, 
and  wherever  possible  the  identical  bonding  pads  are  used  for  the  same  function  on 
each  of  the  three  test  cells.  Therefore,  it  is  possible  to  move  directly  from  the  AND 
gate  to  the  half-adder  and  then  to  the  full-adder  without  changing  the  test  setup. 

In  the  full  adder  logic  cell  layout  shown  in  Figure  4-6;  the  polysilicon  storage 
areas  are  marked  to  correspond  to  the  schematic  diagram  of  Figure  2-21. 

During  the  preliminary  layout  of  the  multiplier  and  adder  arrays  it  was  found 
that  in  order  to  reach  the  inside  logic  cells  with  the  large  number  of  clock  lines 
required,  it  was  necessary  to  make  many  crossings  of  CCD  channels  located  at  the  array 
perimeter.  Since  the  only  way  of  crossing  a CCD  channel  with  an  aluminum  line  is  over 
a polysilicon  gate,  the  number  of  crossovers  required  governs  the  length  of  the  poly- 
silicon gates.  In  the  preliminary  layout  the  optimum  layout  required  a maximum  of 
three  aluminum  crossovers  of  some  polysilicon  gates.  In  the  design  rules  used  for 
the  DP-1  layout,  the  aluminum  width  and  spacing  are  both  0.3  mil.  The  three  metal  line 
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Figure  4-6.  Composite  Layout  Diagram  of  the 
Full -Adder  Logic  Cell 


widths  separated  between  two  metal  gates  require  a polysilicon  bridge  of  7(0.3)  = 

2.1  mil.  In  addition,  the  aluminum  gates  overlap  the  polysilicon  gate  0.1  mil  at  each 
end;  so  the  polysilicon  gate  has  to  be  increased  to  allow  for  the  overlap;  this  makes 
the  polysilicon  gate  length  2.3  mil. 

Since  the  CCD  channel  width  used  on  the  DP-1  is  0.3  mil  wide,  the  area  of  a 
polysilicon  gate  that  is  also  used  as  a bridge  for  three  aluminum  crossovers  is 
0.69  mil2. 

In  an  attempt  to  keep  as  many  storage  well  sizes  and  clock-line  drive  require- 
ments as  similar  as  possible,  it  was  decided  to  make  the  2.3  x 0.3  polysilicon  gate 
area  the  standard  throughout  the  layout  of  the  three  test  cells  and  arrays. 
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In  the  full  adder  layout  shown  in  Figure  4-6  the  polysilicon  areas  labeled  A,  B, 

G,  0,  M,  I,  K,  S,  N all  measure  2.3  x 0.3  mil.  The  polysilicon  area  C under  the  float- 
ing gate  was  made  smaller  in  order  to  obtain  as  large  a charge  in  V$  as  possible  and 
measured  1.0  x 0.3  mil. 

It  was  shown  in  Section  2.3.4  that  in  order  to  provide  adequate  noise  margins  when 
the  standard  charge  stored  in  D cell  was  transferred  to  the  H storage  cell,  it  was 
necessary  to  spread  out  the  charge  by  making  the  H area  1.8  mil  . From  Figure  4-6,  the 
H storage  area  is  in  fact  approximately  twice  the  standard  storage  area. 

In  addition  to  the  need  to  make  crossovers  within  a logic  cell,  the  layout  of  the 
multiplier  and  adder  arrays  also  requires  that  crossovers  be  made  in  between  logic  cells. 
This  is  made  possible  by  placing  one  shift-register  stage-delay  in  the  carry-bit  line 
between  two  full-adders.  The  polysilicon  gate  of  the  single  stage  shift-register 
allowed  three  crossovers  which  was  found  to  be  adequate. 

The  single-bit  stage  delays  can  be  seen  in  the  block  diagrams  of  Figures  2-26  and 
2-27.  Due  to  insertion  of  delay  stages  to  provide  crossovers,  it  was  also  necessary  to 
add  extra  delays  in  the  sum  output  lines  to  ensure  that  the  output  bits  remained 
synchronous. 

The  two-word  4-bit  adder  array  and  the  two-word  3-bit  multiplier  array  were  laid 
out  within  the  same  33  pad  configuration;  this  enabled  testing  to  be  performed  with  a 
single  33  probe  card.  The  layout  of  the  three  test  cells  and  two  arrays  is  shown  in 
Figure  4-1.  In  order  to  compensate  for  transfer  losses  within  sequential  individual 
three-input  adders  of  the  adder  array,  the  01  clock  line  to  each  three-input  adder  was 
taken  to  a separate  bonding  pad.  However,  to  carry  out  the  same  procedure  on  the  multi- 
plier would  have  made  the  clock  line  interconnect  pattern  too  complex  for  an  evaluation 
array. 

It  was  decided  to  test  the  different  test  devices  and  arrays  in  ascending  com- 
plexity, thus  the  two-input  AND  gate  was  the  first  device  tested.  The  schematic  dia- 
gram of  the  AND  gate  (Figure  2-24)  is  a very  simple  logic  cell,  requiring  only  a two- 
phase  clock.  No  difficulty  was  experienced  in  verifying  correct  operation  of  this 
cell,  and  output  signals  of  approximately  one  volt  amplitude  were  produced 
(Figure  4-7) . 

The  half-adder  test  cell  involves  the  use  of  a floating  gate,  and  is  therefore  more 
complex  than  the  AND  gate.  The  schematic  diagram  of  the  half-adder  is  shown  in  Figure 
2-23.  Correct  operation  of  the  half-adder  was  verified  with  clock  phases  as  shown  in 
Figure  2-22  and  at  clock  amplitudes  within  10  percent  of  those  predicted  in  Section 
2.4.1.  A photograph  of  the  input  and  output  waveforms  of  the  half-adder  logic  cell  is 
shown  in  Figure  4-8. 
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AND  Output  Signal  1 V/cm 
"a"  Input  Signal  at  f 5 V/cm 
"b"  Input  Signal  at  f/4  5 V/cm 
Inject  Diode  Waveform  5 V/cm 


AND  Output  Signal,  Scale  1 V/cm 
"a"  Input  Signal  at  f 5 V/cm 
"b"  Input  Signal  at  f/8  5 V/cm 
Inject  Diode  Waveform  5 V/cm 


Figure  4-7.  Input  and  output  signals  of  the  two-input  AND  gate  showing 
that  an  output  bit  is  generated  only  when  both  input  bits 
are  at  logic  "1"  (positive  pulses  3 "1")  and  a correct 
half  cycle  delay 
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Sum  Output  Signal 


Scale  1 v/cm 


Carry  Output  Signal, 


"A"  Input  Signal 
"B"  Input  Signal 


Scale  1 v/cm 


Figure  4-8.  Input  and  Output  Signals  of  Half-Adder  Gate, 
Showing  a Sum  Output  Each  Time  A or  B 
Equal  "1",  But  Not  When  Both  Equal  "1" 


The  full-adder  is  the  most  complex  of  the  individual  test  devices,  having  three 
inputs  and  three  outputs.  Two  of  the  output  channel  paths  converge  to  form  an  OR  func- 
tion as  shown  in  Figures  2-21  and  4-6. 

The  full-adder  was  shown  to  function  with  the  same  clock  phases  and  clock  ampli- 
tudes as  the  half-adder.  Photographs  of  the  input  and  output  waveforms  of  the  full- 
adder  are  shown  in  Figure  4-9.  In  Figure  2-22  the  phase  relationship  of  04  and  05 
are  shown  to  be  identical,  the  two  waveforms  only  differ  by  amplitude;  04  switches 
from  -4.5  to  -8.7,  whereas  the  05  clock  line  switches  from  -4.5  to  -24.  As  an  experi- 
ment the  04  clock  line  was  removed  and  the  K gate  connected  to  05;  the  full -adder  con- 
tinued to  function  correctly  and  the  amplitude  of  the  output  signals  was  unchanged. 
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Sum-Bit  Output 
Carry-Bit  Output 
A Signal  Input 
B Signal  Input 
G Signal  Input 


Sum-Bit  Output 
Carry-Bit  Output 
A Signal  Input 
B Signal  Input 
G Signal  Input 


Figure  4-9.  Input  and  Output  Waveforms  Obtained  from  Full-Adder  Test  Cell 
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Sum-Bit  Output 


Carry-Bit  Output 


A Signal  Input 
B Signal  Input 
G Signal  Input 


Sum-Bit  Output 


Carry-Bit  Output 
A Signal  Input 
B Signal  Input 
G Signal  Input 

Figure  4-9.  Input  and  Output  Waveforms  Obtained  from  the 
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4.4  ARRAY  FUNCTIONAL  TESTING 

4.4.1  Functional  Testing  of  the  4-Bit  Adder-Array 

During  the  testing  of  the  multiply  and  add  arrays  that  use  the  full-adder  cell, 
several  mask  errors  were  found.  However,  it  was  possible  to  circumvent  these  errors 
and  sufficient  testing  was  carried  out  to  show  that  both  arrays  performed  their 
arithmetic  functions  correctly. 

Two  mask-errors  were  discovered  on  the  2-word,  4-bit  adder  array;  one  of  the 
errors  was  the  omission  of  two  windows  in  the  TEOS  etch  mask.  The  omission  resulted  in 
two  metal  gates  having  a layer  of  TEOS  oxide  between  the  aluminum  gate  and  the  gate 
oxide,  preventing  the  gates  from  controlling  the  channels.  This  error  was  circumvented 
by  not  carrying  out  the  TEOS  process  step  and  risking  the  high  probability  of  metal 
breakage. 

The  second  mask  error  found  on  the  adder  array  was  the  omission  of  the  aluminum 
conductor  from  a contact  hole  associated  with  the  least  significant  output  bit,  result- 
ing in  a logical  "0"  output  under  all  input  combinations.  This  error  was  circumvented 
by  only  using  input  words  in  which  both  the  least  significant  bits  were  the  same,  thus 
their  sum  is  always  a logical  "0".  Note  that  the  proper  operation  of  the  logic  circuits 
in  the  least  significant  channel  can  be  verified  with  this  approach. 

The  adder-array  is  designed  to  add  any  two  4-bit  binary  numbers,  from  0000  to  1111 
(decimal  30),  together  and  produce  a 5-bit  binary  sum  of  value  from  00000  to  11110 
(decimal  60). 

Testing  was  initially  carried  out  at  room  temperature  (25°C)  and  at  a clock  fre- 
quency of  10  kHz.  The  clock  frequency  was  divided  down  by  16  to  produce  a 625  Hz 
word  rate  so  that  only  one  output  word  was  displayed  on  the  monitoring  CRT  at  one  time. 

By  using  this  technique  we  could  check  that  the  phase  correcting  shift  register  stages 
had  been  inserted  correctly  since  all  output  bits  should  be  coincidental  in  time. 

The  six  photographs  of  Figures  4-10  through  4-15  show  the  4 most  significant  bits  of 
the  output  sum  for  various  input  number  combinations. 

4.4.2  Maximum  Clock-Rate  of  the  4-Bit  Adder  Array 

The  maximum  theoretical  clock  rate  of  the  adder-array  is  determined  by  the  thermal 
diffusion  time  along  the  longest  length  of  electrode  along  which  a charge  is  moved  from 
one  potential  well  to  the  next.  In  the  full-adder  design,  the  <j>1  electrode  that  controls 
the  three  storage  areas  0,  C,  and  I has  the  maximum  length.  Taking  into  consideration 
that  the  original  layout  was  multiplied  by  1.5,  the  length  of  the  <j>l  gate  is  10.08  mil. 

The  thermal  diffusion  time  can  be  determined  from; 

T = 4 L2  q 

T ? 

TT^  if  T 
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[Not  shown  , S i = 0 ] */ 

Output  bit  S2  = 1 

Output  bit  S^  = 0 

Output  bit  S^  = 0 

Output  bit  S3  = 1 * *'•-•  ‘"t-  XI  •f  ’ K*  " C • -V.,  ~ 

(Decimal  36) 

Figure  4-12.  Output  Signals  from  the  2-Word,  8-Bit  Adder  Array  When 
the  a-Word  = 1110  (28)  and  the  b-Word  = 0100  (8) 


[Not  shown,  S-|  = 0] 

Output  bit  S2  = 1 

Output  bit  Sj  = 1 

Output  bit  S^  = 1 

Output  bit  Sg  = 1 

(Decimal  60) 


Figure  4-13.  Output  Signals  from  the  2-Word,  4-Bit  Adder  Array  When 
the  a-Word  =1111  (30)  and  the  b-Word  =1111  (30) 


[Not  shown,  = 0] 

Output  bit,  S2  = 1 

Output  bit,  Sj  = 0 

Output  bit,  = 1 

Output  bit,  Sg  + 0 

(Decimal  20) 
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Figure  4-14.  Output  Signals  from  the  2-Word,  4-Bit  adder  Array  When 
the  a-Word  = 0101  (10)  and  the  b-Word  = 0101  (10) 


[Not  shown,  S-j  = 0] 

Output  bit,  S2  = 1 

Output  bit,  S3  = 1 

Output  bit,  = 0 

Output  bit,  S5  = 1 

(Decimal  44) 


Figure  4-15.  Output  Signals  from  the  2-Word,  4-Bit  Adder  Array  When 
the  a-Word  = 1110  (28)  and  the  b-Word  = 1000  ('?) 
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where  the  length  of  the  transmission  electrode  is 

L = 27.7  x 10"3  cm 

tt2  = 9.87 

~ = 0.0259  volts 

1 5 

the  mobility  for  N^  = 10  is 
u = 5 x 102  cm2/V  sec 

The  waveform  of  the  $1  clock  has  a symmetrical  aspect  ratio,  so  that  the  thermal 
diffusion  time  is  one-half  the  clock  period  and  so  the  maximum  theoretical  clock  rate 
is 

f = — 
max  2t 

_ tt2  kT  y 
8 L2  q 

= 9.87  x 0.0259  x 5 x 102 
8 x 7.67  x 10'4 

= 20.8  kHz 

Testing  on  a probe  station  at  clock  rates  above  11  kHz  proved  very  difficult  due  to  the 
clock  pulse  coupling  between  the  probes.  Wafer  DPI,  lot  3 No.  9 was  initially  tested  on 
the  probe  station,  mapped,  diced,  and  several  samples  bonded.  All  of  the  following  tests 
were  performed  with  bonded  chips.  The  outputs  of  the  3 most  significant  bits  were 
photographed  at  an  operating  frequency  of  11  kHz.  Figure  4-16  shows  the  correct  output 
011  x x when  the  a-word  = 1100  and  the  b-word  = 0000.  The  b-word  was  then  switched  to 
0100  and  the  3 output  bits  correctly  switched  to  lOOxx  as  shown  in  Figure  4-17. 

The  clock  frequency  was  then  increased  to  50  kHz  and  the  b-word  input  switched  as 
before;  the  array  performed  correctly  as  shown  in  Figures  4-18  and  4-19.  Note  the  out- 
puts at  50  kHz  are  less  than  at  11  kHz.  This  follows  the  predicted  limit  of  20.8  kHz 
for  the  complete  transfer  of  thermal  carriers. 

Output  patterns  for  both  of  the  input  conditions  were  photographed  at  clock  fre- 
quencies of  100  kHz  and  175  kHz  and  the  results  are  shown  in  Figures  4-20  through  4-23. 
Note  that  the  logic  "1"  output  levels  do  not  attenuate  significantly  as  the  frequency 
is  increased,  however;  they  become  obscured  as  the  logic  "0"  output  levels  (fat-zero) 
grow  larger. 
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Output  bit  S3  = 1 
Output  bit  = 1 


Output  bit  Sg  = 0 
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Figure  4-18.  Three  Most  Significant  Output  Bits  from  the 
Adder  Array  Operating  with  a 50  kHz  Clock 
and  Inputs  of  a = 1100  and  b = 0000 


Output  bit  S^  = 0 


Output  bit  S4  = 0 


Output  bit  S5  = 1 


Figure  4-19.  Three  Most  Significant  Output  Bits  from  the 
Adder  Array  Operating  with  a 50  kHz  Clock 
and  Inputs  of  a = 1100  and  b = 0100 
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Output  bit  S3  = 1 
Output  bit  S^  = 1 


Output  bit  S5  = 0 


Figure  4-20.  Three  Most  Significant  Output  Bits  from  the 
Adder  Array  Operating  with  a 100  kHz  Clock 
and  Inputs  of  a = 1100  and  b = 0000 


Output  bit  S3  = 0 
Output  bit  S4  = 0 


Output  bit  Sg  = 1 
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Figure  4-21.  Three  Most  Significant  Output  Bits  from  the 
Adder  Array  Operating  with  a 100  kHz  Clock 
and  Inputs  of  a = 1100  and  b = 0100 
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Horizontal  Scale  IV/Div 


Output  bit  = 1 

Output  bit  = 1 


Output  hit  = 0 
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Figure  4-22.  Three  Most  Significant  Output  Bits  from  the 
Adder  Array  Operating  with  a 175  kHz  Clock 
and  Inputs  of  a = 1100  and  b = 0000 
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Output  bit  S3  = 0 
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Figure  4-23.  Three  Most  Significant  Output  Bits  from  the 
Adder  Array  Operating  with  a 175  kHz  Clock 
and  Inputs  of  a = 1100  and  b = 0100 
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By  switching  the  t>3  input  and  observing  the  output  patterns  on  the  oscilloscope, 
it  can  be  seen  that  the  adder  array  is  performing  the  correct  arithmetic  functions  up 
to  just  beyond  200  kHz,  but  with  a deteriorated  signal-to-noise  level. 

4.4.3  Operating  Temperature  Range  of  the  4-Bit  Adder  Array 

The  operating  temperature  range  was  determined  by  functional  testing  in  a tempera- 
ture controlled  chamber. 


The  clock  voltaaes  were  adjusted  at  a frequency  of  11  kHz  and  at  25°C  so  that 
the  3 most  significant  output  bits  from  the  adder  array  were  performing  correctly  for 
each  input  combination  and  with  a maximum  signal-to-noise  ratio.  The  temperature  was 
increased  in  10°  increments  while  the  inputs  were  switched  and  the  outputs  monitored; 
no  change  in  performance  was  observed  at  35°C  or  45°C. 

At  55°C  the  fat-zero  level  of  sum-bit  and  carry-bit  increased;  sum-bit 
output  remained  unchanged. 


At  65°C  the  full-adder  in  the  fourth  channel  ceased  to  switch,  the  sum-bit 
output  remained  at  "0",and  the  carry-bit.  output  remained  at  "1".  The  S-j  sum-bit 
output  continued  to  function  correctly. 

At  75°C  the  sum-bit  output  inverted  and  the  fat-zero  level  of  began  to 
increase. 

At  85°C  no  further  change.  At  this  point  the  voltage  on  the  control  gate  C,,  was 
adjusted  in  an  attempt  to  reduce  the  fat-zero,  but  this  was  unsuccessful. 

At  95°C  no  further  change. 

At  105°C  the  S3  sum-bit  output  inverted. 

At  110°C  the  fat-zero  level  of  all  outputs  had  increased  so  that  no  signals  were 
discernable.  It  should  be  noted  that  the  combination  of  high  temperature  and  low 
frequency  is  the  most  difficult  operating  condition  from  the  standpoint  of  thermal 
leakage.  Indeed,  proper  operation  at  125°C  could  be  assured  simply  by  operating  the 
existing  device  at  a frequency  above  about  500  kHz. 

The  temperature  was  then  reduced  to  25°C  and  all  outputs  resumed  operating  cor- 
rectly. The  temperature  was  then  lowered  in  10°  steps,  the  inputs  switched  and  again 
the  3 most  significant  output  bits  monitored.  No  change  in  performance  was  noted  at 
15°C  or  5°C. 

At  -5°C,  the  and  outputs  ceased  to  switch  correctly  with  some  input  combi- 
nations, indicating  that  the  carry-bit  from  the  second-most-significant  full -adder  was 
not  being  transferred. 

At  -15°C  to  -55°C  the  fat-zero  level  was  reduced,  but  no  further  change  in  arithmetic 
performance.  The  temperature  was  then  reduced  to  -65°C  and  the  control  line  adjusted 
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so  that  all  channels  performed  correctly  with  a maximum  signal -to-noi se  ratio.  Fig 
ures  4-24  and  4-25  show  the  4 most  significant  output  bits  from  the  adder-array  for 
two  different  input  combinations  when  the  chip  is  operating  at  -65°C. 


[Not  shown,  S,  = 0] 


Output  bit  S9  = 1 


Output  bit  S9  = 0 


Output  bit  S-  = 0 


Output  bit  Sr  = 1 


(Decimal  36) 


Output  Signals  from  the  2-Word,  4-Bit  Adder  Array 
when  a = 0100  (8)  and  b = 1110  (28)  Operated 
at  -65°C 


[Not  shown,  S,  = 0] 


Output  bit  S?  = 0 


Output  bit  S,  = 1 


Output  bit  S.  = 0 


Output  bit  S-  = 1 


(Decimal  40) 


Figure  4-25.  Output  Signals  from  the  2-Word,  4-Bit  Adder  Array 
when  a = 1010  (20)  and  b = 1010  (20) 

Operated  at  -65°C 


4.4.4  Functional  Testing  of  the  3-Bit  Multiplier  Arra. 
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Following  the  demonstration  of  the  adder  array,  the  2-word,  3-bit  multiplier  array 
was  also  functionally  tested.  Two  mask  errors  were  found  which  limited  its  operation. 
One  was  a 0.05  mil  discontinuity  in  a metal  conductor  carrying  an  input  signal  and  the 
other  error  was  the  omission  of  a connection  to  a polysilicon  gate  in  the  sum  channel 
of  the  most  significant  full-adder.  However,  since  the  carry  output  of  the  full-adder 
was  correctly  connected  and  functional,  it  was  possible  to  demonstrate  all  channels  in 
the  multiplier. 

By  keeping  the  a^  input  to  the  open  conductor  at  logic  "0"  and  exercising  all  the 
other  five  inputs  through  all  possible  combinations,  it  was  possible  to  completely 
demonstrate  the  correct  logic  operation  of  the  entire  multiplier  array. 

There  are  six  parallel  outputs  from  the  multiplier,  and  196  different  output  num- 
bers, so  that  producing  a meaningful  photograph  showing  simultaneous  outputs  is  quite 
difficult.  For  simple  combinations  where  only  four  outputs  are  changing,  the  output 
pulses  are  similar  to  those  shown  for  the  adder-array.  However,  when  all  logic  cells 
are  operating  and  charges  have  to  propogate  through  several  full -adder,  there  is  some 
deterioration  of  the  charge  and  a difference  in  the  amplitude  of  the  most  significant 
output  bits  can  be  observed.  The  decrease  in  amplitude  being  dependent  on  the  number 
of  sequential  charge  buckets.  Nevertheless,  correct  operation  for  all  input  combina- 
tions was  demonstrated. 

4.4.5  Interfacing  the  CCD  Devices  to  TTL  Drivers 

It  is  very  easy  to  interface  CCD  data  processing  devices  to  TTL  logic  gates;  this 
was  demonstrated  by  adding  +10  volts  to  the  substrate  bias  and  all  of  the  clock  and 
control  lines,  as  shown  in  Figure  4-26. 

The  standard  output  swing  of  ground  to  4.5  volts  from  the  TTL  gate  is  connected 
directly  to  an  a,  b,  or  g input  gate  on  the  full-adder  or  other  CCD  device.  The  phase 
relationship  and  amplitude  of  the  various  clock  waveforms  is  shown  in  Figure  4-27; 
they  are  referred  to  the  complete  schematic  of  the  full  adder  shown  in  Figure  4-24. 

Note  that  amplitude  and  tolerances  for  the  clock  pulses  and  bias  lines  are  shown 
in  Figure  4-27  for  a particular  device. 

4.4.6  Characterization  Summary 

The  data  derived  from  the  tested  devices  is  summarized  in  Table  4-2.  The  circuit 
designs  tested  here  were  produced  to  demonstrate  the  functionality  of  the  arrays  and 
consequently  no  effort  was  made  to  optimize  the  design.  The  relatively  low  operating 
frequency  is  a direct  result  of  the  maximum  gate  length  used.  The  maximum  length  is 
much  longer  than  required  and  future  designs  will  be  built  for  operation  in  the  low 
megahertz  range.  It  has  already  been  mentioned  that  the  relatively  low  test  frequency 
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Figure  4-26.  Interconnections  and  DC  Voltage  Levels  Required  to  Drive  a 
CCD  Signal  Processing  Device  from  a TTL  Gate 
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Figure  4-27.  Amplitudes  and  Phase  Relationship  of  Clock  and  Data  Waveforms 
for  TTL  to  Adder  Array  Compatibility 
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(11  kHz)  would  be  expected  to  impose  a low  maximum  temperature.  The  low  frequency 
allows  more  time  for  thermal  carriers  to  accumulate  and  eventually  cause  an  error  in 
the  output.  Normally,  the  devices  are  expected  to  operate  near  1 MHz  and  at  this 
frequency  even  the  existing  design  would  operate  correctly  at  temperatures  exceeding 
+165°C.  In  future  designs  the  total  device  area  will  be  reduced  and  so  the  maximum 
operating  temperature  at  1 MHz  would  be  even  higher;  viewed  another  way,  at  any 
given  temperature  the  reduced  area  devices  will  allow  operation  at  lower  frequencies. 


Table  4-2.  Preliminary  Characteristics  of  Existing  Design 

Frequency  Range 

Design  goal:  20  kHz 

Maximum  frequency  with  no  output  degradation:  50  kHz 

Maximum  frequency  with  correct  compution:  200  kHz 

Maximum  Temperature 
At  11  kHz:  +65°C 

At  700  kHz:  +125°C* 

Input/Output 

Demonstrated  to  be  TTL  compatible 

★ 

Calculated  from  11  kHz  data. 
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5.  PROCESS 


5.1  PROCESS  DESCRIPTION 

The  substrates  are  N-type  silicon  wafers  with  a resistivity  of  3 to  5 ,Q/cm  and 
< 1 00>  orientation  to  minimize  the  surface  charge.  Initially,  a 15,000  A oxide  is 
grown  by  a dry  oxygen,  wet  nitrogen,  dry  nitrogen  oxidation  cycle.  The  pattern  for 
the  channel  is  photoresisted,  the  oxide  etched,  and  a 1000  A gate  oxide  is  grown  at 
920°C  in  a H20  atmosphere.  This  step  is  followed  by  the  deposition  of  a 3500  A poly- 
silicon film  which  is  phosphorous  doped  by  gaseous  diffusion.  Next  the  polysilicon 
is  covered  by  a Si^  film  which  is  slightly  oxidized.  This  step  is  followed  by 
photoresisting  and  defining  the  polysilicon  film  by  plasma  etching.  Thereafter  an 
oxidation  is  performed  for  thickening  the  oxide  existing  in  the  channel  region.  This 
operation  is  required  to  ensure  sufficient  masking  oxide  in  the  channel  for  prevent- 
ing boron  penetration  during  the  source  and  drain  diffusion  which  is  carried  out  after 
the  source  and  drain  pattern  is  photoresisted  and  the  oxide  etched.  Next  the  Si^N^ 
covering  the  polysilicon  film  is  etched  followed  by  etching  the  channel  oxide.  After 
this  step,  a 2000  A thermal  oxide  film  is  grown  covering  both  the  channel  region 
(under  the  A1 ) and  the  polysilicon  layer,  contacts  are  defined,  the  oxide  is  etched, 

A1  is  deposited,  and  the  interconnections  are  defined  by  a photoresist  step.  Finally 
the  circuits  are  subjected  to  a 450°C  sinter  in  N2  followed  by  a 450°  sinter  in  H2> 

It  should  be  noted  that  positive  photoresist  is  utilized  through  all  the  processing. 

This  process  is  the  basic  Si-gate  process  which  has  been  utilized  with  the  DP-0 
mask  set.  A metal  step  coverage  problem  was  identified  in  this  process.  It  is  caused 
by  the  formation  of  a "gulch"  under  the  polysilicon  conductors  in  the  field  oxide 
region  during  the  etching  of  the  1500  A oxide  covering  the  "metal"  channel  region 
(Figure  5-1)  and  by  the  steepness  of  the  polysilicon  film  edge.  This  kind  of  step  is 
very  difficult  to  cover  by  the  A1  metallization  producing  open  metal  lines.  It  was 


Figure  5-1.  "Gulch"  Formed  Under  the  Polysilicon  Film 
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found  at  TRW  that  films  formed  by  thermal  decomposition  of  tetraethyl  ortho  silicate 
(TEOS)  produce  very  smooth  covers  in  steps  of  the  kind  just  described  (see  Figure  5-2). 
To  utilize  this  TEOS  deposition,  an  extra  mask  level  had  to  be  incorporated  in  the 
process  sequence.  Thus  the  DP-1  set  of  masks  had  this  extra  level  which  permits  the 
oxide  etching  in  the  channel  region  while  at  the  same  time  protecting  the  field  oxide 
region  which  is  where  the  step  coverage  problem  occurs. 


Figure  5-2.  "Gulch“  and  Steep  Polysilicon  Step  Covered  by  a TEOS  Film 

5.2  SPECIAL  PROCESSES 

5.2.1  Clean  Gate  Oxide  Technology 

TRW  has  developed  clean  oxidation  methods  to  form  S1O2  films  on  silicon  in  the 
fabrication  of  the  gate  structure  of  CCD  devices.  It  has  been  well  established  that 
three  major  processes  affect  the  electrical  stability  of  thermally  grown  oxides:  sub- 

strate cleaning,  oxidation,  and  metallization.  Elaborate  cleaning  and  metallizing 
techniques  have  therefore  been  developed.  However,  the  oxidation  itself  is  the  most 
critical  step  by  far.  Early  experiments  using  conventional  oxidation  systems  showed 
that  reproducible  fixed  charge  values  could  be  obtained  only  with  new  quartz  tubes; 
however,  these  values  deteriorated  rapidly  with  time.  Reproducible  fixed  charge  values 
were  found  when  double  wall  quartz  tubes  were  utilized  and,  as  shown  in  Figure  5-3, 
these  values  changed  as  a function  of  time  of  operation  of  the  particular  quartz  tube, 
being  more  pronounced  for  <111>  than  for  <100>  oriented  substrates.  Faster  aging  was 
observed  in  oxides  grown  in  spectrosil  than  in  GE  204  quartz.  It  has  been  claimed 
that  spectrosil  quartz  is  practically  free  from  metallic  impurities  (total  concentra- 
tion = 1 ppm);  however,  a recent  analysis  shows  Na  (20  ppm)  as  its  main  impurity. 

GE  204  contains  A1  (40  to  50  ppm)  and  Na  (20  to  30  ppm). 
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Figure  5-3.  Fixed  Charge  as  a Function  of  Quartz  Tube  Age 

TRW's  results  also  indicate  that  by  very  fast  quenching  to  room  temperature,  the 
fixed  charge  can  be  eliminated.  Moreover,  when  oxides  with  small  values  of  the  fixed 
charge  (1  x 10^  cm"2)  were  quenched  in  Ng  to  -20°C,  or  annealed  in  Ng  at  a low  tem- 
perature (550°C),  the  fixed  charge  could  be  eliminated  and  experimental  curves  near 
the  theoretical  values  obtained.  It  is  suspected  that  A1  and  Na  atoms  in  the  oxide 
originating  in  the  quartz  tube  are  causing  the  observed  aging  phenomena.  The  quench- 
ing dependent  cnanges  of  the  fixed  charge  may  be  thus  caused  by  an  interstitial  £ sub- 
stitutional reaction  of  A1  atoms  in  the  SiC^  network  lattice. 


5.2.2  Ion  Implantation 


The  impurity  ion  implant  into  semiconductors  from  a high  energy  accelerator  (30 
to  200  keV)  has  many  attractive  features  for  improved  CCD  manufacturing  control.  It 
offers  a means  of  introducing  precise  measured  quantities  of  a single  species  of 
doping  impurity  in  a way  that  far  surpasses  what  can  be  done  with  a thermal  diffusion 
furnace.  The  accelerator  is  used  as  an  adjunct  to  the  conventional  methods  and  the 
process  still  involves  impurity  distribution  by  thermal  means.  TRW  has  in  operation 
a 200  keV  ion  accelerator  which  is  presently  set  up  to  implant  boron,  phosphorus,  and 
arsenic. 

The  ion  dose  may  be  controlled  as  accurately  as  the  following  factors  will  allow. 
These  are  the  degree  to  which  the  ions  are  single  charged,  effectiveness  of  suppression 
of  secondary  electrons,  and  electronic  limitations  of  the  integration  system. 
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Uniformity  of  the  dose  produced  by  ion  implantation  is  determined  by  linearity 
of  the  beam  scanning  process  and  suppression  of  unscanned  neutrals.  Linearity  of 
the  scanning  process  is  largely  a matter  of  good  electronics  design,  while  suppression 
of  neutrals  is  a matter  of  filtering  out  as  many  neutral  particles  as  possible  prior 
to  deflection  and  maintaining  good  vacuum  in  the  region  of  the  scan  plates. 

To  a first  approximation,  implanted  ions  are  distributed  in  a Gaussian  shape. 

The  mean  value  and  standard  deviation  of  this  distribution  are  determined  in  a pre- 
dictable manner  from  ion  energy  and  mass  of  the  substrate  atoms.  The  major  deviation 
from  this  Gaussian  distribution  is  caused  by  a phenomenon  called  channeling,  in  which 
ions  are  guided  deep  into  the  material  parallel  to  major  crystalline  axes.  Channel- 
ing can  be  controlled  by  proper  alignment  of  the  crystal  with  the  ion  beam  direction, 
or  by  implanting  through  a thin  amorphous  layer  of  oxide  or  nitride  on  the  wafer 
surface. 

5.2.3  Polycrystall ine  Silicon  Technology 

Polycrystalline  silicon  films  are  used  as  gate  electrodes  in  our  CCD  technology. 
Owing  to  the  self-aligning  features  of  the  processing  and  since  they  form  a very 
critical  portion  of  the  device  structure,  the  chemical  and  physical  structure  of  these 
films  has  to  be  very  well  controlled.  Thus  deposition,  doping,  and  etching  have  to  be 
carefully  determined.  These  films  are  usually  deposited  by  the  thermal  decomposition 
of  silane  in  rf-heated,  horizontal  epitaxial  reactors  in  a hydrogen  carrier  gas. 

Their  crystalline  structure  is  very  sensitive  to  deposition  conditions.  And  it  is  the 
film's  crystalline  structure  which  determines  its  electrical  characteristics.  We  have 
found  at  TRW  that  650°C  is  the  optimum  deposition  temperature  for  polycrystall ine 
films.  Films  prepared  at  higher  temperature  usually  show  poor  crystalline  perfection. 
The  deposition  rate  at  650°C  is  approximately  0.1  u/minute.  Since  these  films  are 
used  both  as  conductors  and  diffusion  masks,  no  doping  impurities  are  added  inten- 
tionally. Conventional  gaseous  diffusion  techniques  have  been  applied  to  dope  these 
films.  Thus  BBr^  is  used  to  diffuse  boron  and  P0C1 ^ is  used  to  diffuse  P.  The  dif- 
fusion temperature  is  950°C. 

5.2.4  TEOS 

Silicon  oxide  films  produced  by  the  pyrolytic  decomposition  of  tetraethyl  ortho 
silicate  have  been  used  to  produce  masking  oxides  and  thickening  field  oxides.  The 
thermal  decomposition  is  carried  out  at  725°C  in  a low  vacuum  (lu).  Immediately  after 
deposition,  these  films  have  relatively  low  density  and  their  etch  rate  in  conventional 
buffered  hf  solution  is  very  high  (10:1  compared  to  thermal  oxides).  They  are  usually 
"densified"  at  elevated  temperatures  to  produce  films  comparable  in  etch  rate  to  that 
of  thermal  oxides. 
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These  films  tend  to  produce  very  good  step  coverage  over  polysilicon  steps, 
therefore  preventing  "shadowing"  during  deposition  of  the  A1  films  used  for  pads 
and  interconnections.  Figure  5-4b  shows  a scanning  electron  microscope  picture  of 
polysilicon  gates  covered  with  TEOS  as  compared  to  the  same  kind  of  gate  covered  by 
silox  deposition  in  Figure  5-4a.  Clearly  shown  is  the  gentle  slope  obtained  with 
TEOS  as  compared  to  the  rounded  oxides  obtained  with  silox. 

5.3  BASIC  PROCESSING  STEPS 

The  following  is  a list  of  the  major  process  sequence  steps: 

1.  N-type  wafers  <100>,  3 to  5 fi  cm 

o 

2.  Field  oxide  grown  15,000  A 

3.  Photoresist  channel 

4.  Etch  channel  oxide 

o 

5.  Grow  1000  A gate  oxide  (Figure  5-5a) 

o 

6.  Deposit  3500  A polysilicon  film  at  650°C 

7.  Diffuse  P into  the  polysilicon;  PoCl,  source  at  950°C 

o 

8.  Deposit  2000  A nitride 

9.  Photoresist  polysilicon  (Figure  5-5b) 

10.  Etch  oxide 

11.  Plasma  etch  nitride  and  polysilicon  films 

o 

12.  Reoxidize  channel  oxide  to  1500  A 

13.  Photoresist  source  and  drain 

14.  Etch  oxide 

15.  Predeposit  and  diffuse  boron;  BBr^  source  at  950°C  (Figure  5-5c) 

16.  Etch  Si3N4. 

Steps  1 through  16  are  common  to  DP-0  and  DP-1  processing.  DP-0  processing  continues 
as  follows: 

17.  Etch  oxide 

o 

18.  Grow  2000  A channel  oxide  (Figure  5 5d) 

19.  Photoresist  contacts 

20.  Etch  oxide 

21.  Evaporate  A1 

22.  Photoresist  and  define  A1  (Figure  5-5e) 

23.  Sinter  450°C  N2  and  H2- 
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a)  Covered  by  13,000  A Silox  Film 


b)  Covered  by  13,000  A TEOS  Film 


Figure  5-4.  Polysilicon  Gates  (X4000) 


□ OXIDE 


Figure  5-5.  Steps  in  the  DPI  Process 
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DP-1  processing  continues  as  follows  after  step  16 


24. 

TEOS  deposition;  725°C  in 

vacuum 

25. 

Density  and  getter  TEOS 

26. 

Photoresist  polysilicon  protect 

27. 

Etch  oxide 

28. 

0 

Grow  2000  A channel  oxide 

(Figure  5-5d) 

29. 

Photoresist  contacts 

30. 

Etch  oxide 

31. 

Evaporate  A1 

32. 

Photoresist  and  define  A1 

(Figure  5-5e) 

33. 

Sinter  450°C  Ng  and  Hg. 

6.  SIGNAL  PROCESSING  INTERFACE  STUDIES 


6.1  INTRODUCTION 

6.1.1  Background 

Networks  of  computers  used  for  signal  processing  exhibit  several  unique  features 
by  comparison  to  networks  of  computers  used  for  data  processing  or  for  realtime  control 
(e.g.,  avionic  data  bus  systems).  The  most  obvious  difference  is  that  signal  process- 
ing computers  generally  may  be  arranged  to  process  data  which  is  grouped  into  blocks, 
e.g.,  256  words.  This  means  that  a substantial  amount  of  time  (1  to  10  msec),  is 
required  to  transmit  data  between  processors,  and  that  a circuit  switching  interconnec- 
tion scheme  which  requires,  say,  10  to  100  psec,  to  establish  a data  path  is 
acceptable. 

The  second  unique  aspect  of  signal  processing  networks  is  the  irregularity  of 
typical  network  topology.  The  system  designer  needs  freedom  to  adjust  his  system  to 
the  problem  requirements.  It  is  desirable  to  be  able  to  connect  the  network  of  signal 
processors  in  the  same  flow  form  as  the  problem  exhibits.  An  important  aspect  of  this 
is  the  ability  to  optimize  redundancy  in  each  specific  network.  Redundancy  is  increased 
(at  a cost  of  increased  hardware  complexity)  by  providing  more  than  one  signal  path 
between  important  network  resources.  If  any  portion  of  one  path  fails,  an  alternate 
path  still  exists. 

The  third  distinguishing  aspect  of  signal  processing  networks  is  the  frequent 
need  to  process  data  in  a pipeline  fashion.  This  arises  because  of  the  relative 
slowness  of  most  signal  processing  computers  with  rospect  to  the  input  data  rate.  The 
signal  processing  network  designer  is  not  permitted  the  luxury  of  transferring  a single 
block  of  data  at  a time;  many  blocks  must  be  transferred  simultaneously  within  the 
system. 

6.1.2  Overview 

The  network  organization  which  evolved  during  this  study  is  called  the  transparent 
ii  ?rface  system.  In  the  taxonomy  of  Anderson  and  Jensen,*  it  is  an  irregular  network 
of  signal  processors  interconnected  with  dedicated  paths  using  decentral ized  routing 
(Figure  6-1).  Specifically,  the  network  consists  of  an  assortment  of  signal  process- 
ing elements  each  of  which  is  connected  to  a switching  mode  (or  node  processor).  The 
switching  nodes  are  interconnected  with  communication  links. 


*G.A.  Anderson  and  E.  Jensen,  "Computer  Interconnection  Structures:  Taxonomy, 

Characteristics  and  Examples,"  Computing  Surveys,  Vol . 7,  December  1975,  pp.  197-213. 
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Figure  6-1.  Network  Organization 

This  study  defines  the  processing  required  at  the  switching  node.  The  node 
processor  can  be  implemented  in  any  of  several  LSI  technologies  depending  on  perfor- 
mance requirements  and  optimization  criteria  (size,  power,  development  cost,  reli- 
ability, etc.).  The  switching  node  may  be  implemented  efficiently  using  the  cross- 
point  switch  and  the  path  selection  network  described  in  detail  in  Section  6.4  of 
this  report.  Baseline  nodes  would  have  up  to  eight  data  ports,  one  of  which  connects 
to  the  "underlying"  signal  processor. 

The  only  topological  restriction  imposed  on  the  system  designer  is  that  only  one 
link  can  be  connected  between  any  pair  of  nodes  (i.e.,  links  cannot  be  paralleled).  In 
all  other  respects  the  designer  enjoys  complete  freedom. 

System  control  is  distributed  since  each  node  contains  an  interconnection  table 
which  defines  the  complete  system.  This  avoids  the  failure  sensitivity  of  networks 
which  have  centralized  control.  Although  the  latter  may  be  more  efficient  in  terms  of 
hardware,  a single  failure  in  the  centralized  controller  can  disable  the  entire  network. 
The  distributed  control  network  exhibits  desirable  fail-soft  characteristics;  although 
a single  failure  may  cripple  a portion  of  the  network,  the  remainder  functions  normally. 
Distributed  control  facilitates  simultaneous  utilization  of  many  links.  This  is  sim- 
plified by  use  of  a nonhierarchical  crosspoint  switch  at  each  node  which  allows  multiple 
links  to  cross  at  each  node. 

Dynamic,  associative  priority  is  implemented;  priority  may  change  from  message  to 
message  or  within  a message.  If  an  interferer  with  sufficient  priority  requests  a link 
which  is  in  use,  interrupts  are  sent  to  the  link  users  informing  them  that  the  link 
must  be  relinquished  within  a time  period  determined  by  network  protocol. 
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6.1.3  Summar 


The  preliminary  system  design  is  complete  for  the  transparent  interface  system 
and  is  described  in  Section  6.2.  The  system  has  been  simulated  with  a discrete  event 
simulator  as  described  in  Section  6.3;  the  simulation  was  used  to  eliminate  potential 
deadlock  problems.  Two  LSI  circuits  have  been  designed  as  described  in  Section  6.4 
to  facilitate  the  implementation  of  efficient  signal  processing  interconnection  systems. 
The  crosspoint  switch  circuit  is  of  general  applicability  to  all  forms  of  circuit 
switching  networks,  while  the  path  selection  logic  is  optimized  for  this  system.  The 
transparent  interface  system  presented  here  allows  the  signal  processing  designer  to 
concentrate  on  solving  the  important  signal  processing  problems  - little  effort  need 
be  expended  on  the  interconnection  problem  which  has  been  solved  in  an  efficient  manner 
which  is  directly  suitable  for  LSI  implementation. 

6.2  DESCRIPTION  OF  TRANSPARENT  INTERFACE  SYSTEM 

In  this  section  a message  routing  algorithm  is  presented  which  can  be  used  for 
any  type  of  network  topology  ranging  from  in-line  to  fully  matrixed.  This  allows  the 
system  designer  great  flexibility  in  choosing  an  interconnection  scheme. 

The  topology  of  a network  determines  the  amount  of  redundancy  and  failure  sensi- 
tivity of  the  system.  Redundancy  is  increased  by  adding  alternate  paths  between  the 
important  nodes.  Then,  if  an  intermediate  node  along  the  primary  path  fails,  or  if 
one  or  more  of  the  links  in  the  primary  path  is  already  in  use,  an  alternate  path 
exists.  Fault  tolerance  is  introduced  not  only  through  the  use  of  multiple  paths,  but 
also  by  distribution  of  control  throughout  the  network.  Thus  the  system  does  not  re- 
quire survivability  of  a central  controller.  Even  if  some  nodes  fail,  the  system  will 
still  be  functional.  Note,  however,  that  certain  communication  paths  may  no  longer 
be  usable.  Each  node  is  independent;  there  is  no  common  data  structure  or  master 
controller. 

Another  feature  of  this  system  is  that  several  communication  paths  can  be  used 
simultaneously.  For  example,  a node  can  have  several  paths  passing  through  it  at 
once.  This  is  due  to  the  multiple  bus  circuit-switching  technique  which  is  used  for 
routing  information  from  source  to  destination. 

Deadlock  does  not  develop  in  this  system  because  of  the  way  in  which  the  path 
selection  algorithm  is  implemented.  After  a node  controller  has  decided  on  a primary 
path  for  communication,  it  establishes  the  path  by  acquiring  one  link  at  a time.  If 
any  link  is  tied  up,  the  node  controller  remembers  which  one  it  is,  frees  all  the 
links  that  have  been  acquired,  and  formulates  an  alternate  path.  Deadlock  would  occur 
if  a node  controller  were  to  hold  onto  the  links  it  had  previously  acquired  while 
waiting  for  the  busy  links  it  needs.  In  the  proposed  system  if  no  alternate  path 
exists,  the  node  controller  can  either  wait  and  retry  the  primary  path,  or  increase 
the  priority  of  its  message  and  retry  the  path. 
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Flexibility  is  achieved  by  permitting  a node  controller  to  change  its  message 
priority  at  any  time.  Dynamic  priority  gives  a node  the  ability  to  match  priority  to 
importance  for  different  parts  of  a message.  For  example,  if  the  message  contains  the 
status  of  a remote  detection  unit,  some  of  the  data  will  be  of  a housekeeping  nature 
(i.e.,  temperature,  etc.)  which  is  not  nearly  as  important  as  the  data  produced  during 
the  detection  of  a high  priority  event. 

The  assumed  systpm  ground  rules  for  path  selection  are: 

1.  All  communication  links  provide  two-way  communication. 

2.  A node  can  connect  multiple  paths  simultaneously  (i.e.,  cross  bus). 


3.  The  best  path  contains  a minimum  number  of  links. 


4.  If  two  or  more  paths  contain  the  same  number  of  links,  either  path  is 
equally  desirable. 


Two  data  structures  are  used  to  select  a path:  the  path  matrix  and  threaded  stack. 

The  path  matrix  identifies  all  the  interconnections  between  the  nodes  comprising  the 
network  (i.e.,  which  nodes  are  interconnected).  This  matrix  is  resident  in  each  node 
controller  and  may  be  updated  during  the  path  acquisition  sequence.  The  threaded  stack 
is  used  for  the  actual  path  selection.  For  any  given  node  the  stack  is  loaded  by  enter- 
ing all  those  nodes  adjacent  to  it,  followed  by  those  nodes  which  are  two  links  away, 
etc.  until  the  destination  node  is  found.  When  a node  name  is  entered  in  the  stack,  Its 
connection  to  a prior  node  in  the  stack  is  indicated  by  the  link  field.  The  link  field 
contains  the  stack  address  of  the  prior  node.  The  path  to  the  given  node  is  found  by 
threading  back  through  the  links  to  the  originator. 


6.2.1  Path  Matrix  Operations 


The  path  matrix  Is  the  fundamental  data  structure  for  the  entire  system  and  there- 
fore must  be  kept  accurate  at  all  times.  If  the  network  topology  never  changes,  there 
is  no  need  to  burden  the  node  controller  with  updating  the  matrix.  In  this  case,  the 
path  matrix  can  be  stored  In  a permanent  read-only  memory,  such  as  a ROM.  However,  if 
the  topology  Is  expected  to  change  due  to  node  failures  or  addition  and  deletion  of 
nodes,  the  path  matrix  must  be  stored  In  a read/write  memory,  such  as  a RAM.  This  sec- 
tion deals  with  two  problems:  creation  of  the  path  matrix  upon  power-up  and  regular 
updating  of  the  path  matrix  to  reflect  recent  network  changes. 


Suppose  we  have  the  network  of  Figure  6-2  and  that  we  are  node  X4.  Upon  power-up 
we  must  determine  the  path  matrix  for  this  system.  We  will  also  assume  that  the  node 
controller  knows  which  port  Is  active.  Each  node  has  a fixed  maximum  number  of  ports 
(e.g.,  eight),  but  not  all  of  the  ports  will  be  used  at  any  given  time.  For  this 
reason,  a status  bit  is  associated  with  each  port  to  indicate  whether  or  not  that  port 
is  currently  being  used.  The  node  controller  sends  a "who  are  you?"  command  from  each 
port  to  the  adjacent  nodes.  These  interrogated  nodes  will  then  reply  with  their  name. 
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Thus,  X4  sends  "who  are  you?"  through  his  only  active  port.  This  message  is  received 
and  processed  by  XI  who  responds  with  "I  am  XI."  This  information  is  now  stored  in 
matrix  form  as  in  Figure  6-3.  A connection  between  two  nodes  is  indicated  by  placing 
a "1"  in  the  proper  position.  Notice  that  the  matrix  position  (X4,  X4)  is  zero  due  to 
the  fact  that  X4  cannot  talk  to  itself. 


X4 

XI 

X4 
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XI 
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Figure  6-3.  Path  Matrix  After  Interrogating 
Neighbors 

After  a response  has  been  received  from  each  active  port,  the  node  controller 
interrogates  each  neighbor  to  determine  their  neighbors.  Thus,  X4  selects  a path  to  XI 
by  executing  the  path  selection  algorithm  and  acquires  the  path  to  XI.  After  complet- 
ing the  path,  X4  asks  XI  "who  are  you  connected  to?"  Node  XI  will  reply  with  a list 

of  his  nearest  neighbors.  In  the  event  that  XI  has  not  completed  his  list  (e.g., 
upon  simultaneous  start-up  of  every  node  in  the  system),  XI  will  reply  with  a partial 
list  response  which  provides  X4  with  Xl's  present  information,  but  also  warns  X4  that 
XI  must  be  interrogated  again  later  to  complete  that  section  of  the  path  matrix. 

Assuming  that  XI  has  completed  the  list  of  his  closest  neighbors,  he  would 
respond  to  X4's  question  with  "my  neighbors  are  X4,  X2,  and  X3."  Node  X4  would  then 
add  this  new  information  to  the  path  matrix  which  would  result  in  the  matrix  of 
Figure  6-4. 
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This  process  of  filling  in  vacant  rows  in  the  path  matrix  continues  by  forming 
a path  to  the  node  representing  the  next  incomplete  row  in  the  matrix,  which  is  X2. 
Once  X4  has  constructed  a path  to  X2  he  then  asks  "who  are  you  connected  to?"  and  X2 
will  respond  with  his  list  of  neighbors.  This  information  is  inserted  into  the  path 
matrix  to  yield  a more  complete  picture  of  the  network  (Figure  6-5). 


Figure  6-5.  Path  Matrix  After  Building  a Path  to  X2 
and  Asking  Him,  "Who  Are  Your  Neighbors?" 

The  path  matrix  construction  terminates  when  the  lists  of  neighbors  sent  back 
from  interrogated  nodes  contain  no  new  node  names.  The  final  path  matrix,  from  X4's 
point  of  view,  is  shown  in  Figure  6-6.  Notice  that  since  all  communication  links  are 
bidirectional,  the  matrix  has  diagonal  symmetry.  If  on  the  other  hand,  some  of  the 
links  were  unidirectional  or  if  some  line  drivers  have  failed,  the  matrix  would  be 
unsymmetric.  For  example,  if  X4  could  send  data  to  XI,  but  not  vice  versa,  the  posi- 
tion (X4,  XI)  would  be  "1"  while  (XI,  X4)  would  be  "0". 
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Figure  6-6.  Final  Path  Matrix 


6.2.2  Path  Selection 

Path  selection  is  a fairly  straightforward  procedure  using  the  path  matrix  and 
threaded  stack.  Using  the  example  network  shown  in  Figure  6-2,  we  shall  show  how  a 
path  is  created. 

Assume  that  node  X4  has  a message  for  X5.  By  looking  at  the  network  topology  we 
see  that  there  are  two  paths  to  X5  from  X4,  both  of  which  are  equally  desirable 
(since  the  assumed  optimization  criterion  is  to  minimize  the  number  of  lines). 

Each  word  on  the  path  stack  is  divided  into  two  parts:  a node  name  and  link  back 

to  a previous  stack  member,  as  shown  in  Figure  6-7.  To  initiate  a path  search,  the 
first  word  placed  on  the  stack  contains  the  originator's  name  and  a null  link.  Since 
every  node  can  only  be  placed  on  the  stack  once,  each  node  pushed  is  marked.  After 
X4  has  been  marked,  its  row  in  the  scratchpad  path  matrix  (which  is  simply  a copy  of 
the  "permanent"  path  matrix)  is  scanned  for  "l"s.  Each  time  a "1"  is  found,  the  node 
associated  with  that  column  is  pushed  on  the  stack  and  linked  to  the  node  associated 
with  the  row  which  is  presently  being  scanned,  i.e.,  X4.  The  link  concatenated  with 
XI  will  be  X4's  stack  position  (Figure  6-8). 
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NODE  NAME 
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X4 

THE  INITIAL  STACK  WORD  IS 
THE  ORIGINATING  NODE 


Figure  6-7.  Stack  Word 
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STACK  POSITION 

Figure  6-8.  Stack  After  X4's  Row  in  Path  Matrix 
Has  Been  Connected 

Since  XI  is  not  the  destination  node,  we  must  continue  the  search  by  putting  any 
new  candidates  connected  to  XI  on  the  stack.  So,  scanning  Xl's  row  in  the  path  matrix 
we  find  XI  is  connected  to  X4,  X2,  and  X3.  But  X4  has  already  been  used,  so  inly  X2 
and  X3  are  entered  (Figure  6-9). 
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Figure  6-9.  Stack  After  Scanning  XT's 
Row  in  Path  Matrix 
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The  row  belonging  to  the  next  point-of-seamh  node  in  the  stack,  X2,  must  now  be 
examined  for  nodal  connections.  From  the  scratchpad  path  matrix,  X2  has  connections 
to  X3  and  X5.  Since  node  X3  has  already  been  used,  it  is  neglected  and  X5  is  pushed 
on  the  stack  and  linked  back  to  X2.  We  have  finally  found  the  destination  node 
(Figure  6-10). 

The  path  used  to  access  this  node  is  formulated  by  threading  back  through  the 
linked  list  of  nodes  which  uses  the  destination  node  as  a list  head.  Thus,  the  path 
to  X5  from  X4  is  X5-X2-X1-X4,  as  shown  in  Figure  6-11. 
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PROPOSED  PRIMARY  PATH:  X5-X2-X1-X4 

Figure  6-11.  Path  Threading 


Now  that  a path  has  been  proposed,  X4  must  acquire  each  link  comprising  che  path 
one  at  a time.  This  is  done  by  sending  out  "get  link"  (GTLNK)  comr aids  to  the  appro- 
priate nodes.  Initially  X4  sends  GTLNK,  XI  to  XI  since  that  is  "he  first  intermediate 
node  in  the  path.  Since  the  link  is  not  currrently  being  used,  XI  replies  with  a "link 
established"  (LINKES).  Having  acquired  the  initial  link,  X4  now  tries  for  the  link 
connecting  XI  and  X2  by  issuing  a GTLNK,  X2  to  the  channel  connected  to  XI.  Upon 
receiving  the  message,  XI  first  checks  the  status  of  the  port  leading  to  X2.  If  that 
port  is  free,  XI  sends  the  original  message  to  X2  and  physically  connects  the  port 
leading  to  X2  with  the  port  leading  to  X4  using  the  crosspoint  switch.  Assuming  X2 
does  not  wish  to  use  the  link  sought  after  by  X4,  he  will  respond  with  LINKES,  which 
is  passed  through  node  XI  to  X4. 

Node  X4  completes  the  path  by  sending  a GTLNK,  X5  along  the  unfinished  path. 

This  message  passes  to  X2  which  performs  the  check  on  the  appropriate  port  and  passes 
the  message  on  to  X5.  If  X5  responds  with  LINKES,  then  the  path  is  complete.  X4  must 
now  send  a request  to  use  the  processor  interfaced  to  X5  (GETPRC).  If  X5's  processor 
is  busy  the  response  will  be  PRCBSY;  otherwise  it  will  be  PRCRDY. 

If  a processor  is  busy,  the  originator  sends  a "broadcast:  free  link"  (BFRLNK) 

command,  which  causes  all  the  nodes  along  the  completed  path  to  relinquish  the  path 
segments.  At  this  point,  the  node  controller  can  do  one  of  two  things: 
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1)  If  another  node  is  in  the  system  with  an  underlying  processor  that 
performs  the  same  function  as  processor  node  X5,  a path  to  the  alternate 
node  can  built . 

2)  If  the  processor  at  node  X5  performs  a unique  function,  the  node  con- 
troller must  either  throw  the  data  away  or  keep  trying  X5  until  he 
receives  a PRCRDY. 

In  the  above  example  we  assumed  that  all  the  links  requested  by  X4  were  free, 
However,  a desired  link  may  be  in  use  by  another  node  controller  who  is  sending  a 
higher  priority  message.  Let  us  assume  that  the  response  to  GTLNK,  X5  was  "link  busy" 
(LNKBSY)  and  that  X4  does  not  wish  to  increase  the  priority  of  his  message.  Node  X4 
must  enter  the  busy  information  into  the  scratchpad  path  matrix  (since  the  link  will 
be  busy  only  temporarily)  by  setting  the  positions  X2,  X5,  and  X5,  X2  to  "0".  The 
link  already  acquired  (i.e.,  all  acquired  prior  to  receiving  the  LNKBSY)  are  released 
by  issuing  a BFRLNK.  If  the  processor  waited  for  the  busy  link  to  become  free,  a 
deadly  embrace  situation  could  occur. 

X4  reruns  the  path  selection  algorithm  with  this  new  information  injected  into 
the  scratchpad  path  matrix.  Although  most  of  the  information  left  in  the  old  stack  is 
good,  it  is  harder  to  delete  the  bad  information  than  it  is  to  start  over  again.  The 
stack  creation  process  remains  the  same  up  to  the  point  in  which  X2's  row  is  scanned. 

The  result  will  now  be  that  X2  is  only  connected  to  X3,  but  X3  has  already  been  pushed 
on  the  stack.  Thus,  X2  is  a dead  end  and  X3‘s  row  is  examined  for  possible  links. 

The  first  unused  connection  in  X3's  row  is  the  X3,  X5  position.  Once  again  the  destina- 
tion has  been  reached,  and  a new  path  is  proposed  as  shown  in  Figure  6-12. 


PROPOSED  ALTERNATE  PATH:  X5-X3-X1-X4 

Figure  6-12.  Threading  Alternate  Stack 

X4  again  tries  to  establish  the  new  proposed  path.  If  the  new  path  cannot  be 
completed,  X4  marks  the  busy  link  in  the  scratchpad  path  matrix  and  tries  once  more. 
X4  keeps  trying  until  it  is  not  possible  to  reach  the  destination  node.  X5  can  then 
wait  and  try  again  from  the  beginning,  or  raise  the  message  priority. 
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6.2.3  Flowcharts 


The  algorithm  implemented  for  a path  matrix  construction  is  shown  in  Figure  6-13. 
It  is  assumed  that  all  matrix  locations  contain  a zero  initially  since  it  is  assumed 
that  there  will  be  fewer  connections  than  nonconnections. 

The  matrix  updating  process  is  merely  a subset  of  the  path  matrix  construction 
procedure.  If  X4  is  suspicious  of  the  condition  of  some  node  in  the  system,  he  can 


PATH  MATRIX  CONSTRUCTION 


Figure  6-13.  Path  Matrix  Construction 


send  out  a "who  are  you  connected  to?"  to  the  node  in  question  and  use  the  response 
to  make  any  necessary  changes  in  the  path  matrix.  If  the  network  has  a dynamic  topol- 
ogy, all  the  nodes  in  the  system  must  periodically  send  out  interrogations  to  all 
other  nodes  in  the  system  to  discover  additions  or  deletions  to  the  network.  If  it 
is  undesirable  to  have  the  nodes  constantly  updating  their  matrices,  then  one  node 
in  the  system,  acting  as  a monitor,  could  inform  all  other  nodes  that  a change  has 
been  made  to  the  system  and  that  they  should  refresh  their  matrices.  In  this  way, 
the  path  matrix  updating  process  would  be  executed  only  when  there  is  a change  in  the 
network  topology. 
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The  path  selection  algorithm  is  presented  in  detail  in  the  flowchart  of  Figure 
6-14.  Notice  that  the  flowchart  is  exited  if  no  connection  is  possible.  At  this  time, 
the  node  controller  could  keep  trying  the  same  node,  try  a different  node,  raise  the 
priority,  or  reject  the  message.  Another  possibility  is  that  the  desired  node  is  no 
longer  active  in  the  system  due  to  failure  or  deliberate  deletion  of  a node  in  the 
system.  If  that  is  the  case,  it  would  be  appropriate  to  inform  the  other  nodes. 


PATH  SELECTION 


Figure  6-14.  Path  Selection 


Figure  6-15  shows  the  overall  operation  of  the  node  controller  including  the 
path  matrix  construction  and  the  path  selection  routines. 

6.2.4  Recommendations  for  Future  Work 

Two  problem  areas  remain  unresolved: 

1)  Startup  procedures  for  systems  with  dynamic  path  matrices 

2)  Protocols  for  the  processor  which  "underly"  the  node  controller. 

For  the  first  problem,  it  was  noted  in  Section  6.2.1  that  partial  lists  might  be 
exchanged  during  startup.  If  all  nodes  are  powered-up  at  about  the  same  time  it  is 
unclear  that  the  path  matrices  will  converge  to  complete  information.  This  should  be 
investigated  analytically  prior  to  use  of  this  scheme  with  dynamic  path  matrices.  Use 
of  fixed  or  semifixed  path  matrices  (i.e.,  a single  network  controller  reloads  all 
path  matrix  RAM's  whenever  a network  change  occurs)  is  a solution;  however,  the  ques- 
tion remains  unanswered. 
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Figure  6-15.  Flowchart  of  Overall  Operation 

The  second  problem  leads  to  the  question  of  what  information  an  "underlying" 
processor  needs  to  effectively  use  this  networking  scheme.  At  one  extreme  a processor 
might  specify  the  destination  processor  for  each  data  packet.  If  a path  cannot  be 
completed  a simple  "path  not  completed"  message  could  be  the  response.  At  the  other 
extreme  a processor  might  simply  denote  the  type  of  processor  to  which  the  data  is 
to  be  routed.  The  latter  approach  could  be  implemented  easily  by  adding  a simple 
processor  between  the  "underlying"  processor  and  the  node  controller.  The  same  effect 
can  be  obtained  in  the  former  case  if  the  underlying  processor  is  a general  purpose 
machine  with  sufficient  reserve  to  add  a network  resource  selection  program. 

6.3  SIMULATION  RESULTS 

Using  the  interface  system  described  in  Section  6.2,  a simulation  was  conducted 
to  reveal  the  operational  characteristics,  and  to  indicate  whether  deadlock  situations 
might  arise.  A simple  SONAR  problem  was  used  for  the  simulation.  The  simulation  was 
performed  in  SALSIM  (System  Analysis  Language  for  SIMulation),  a TRW  developed  discrete 
event  simulator.  SALSIM  is  similar  to  the  well-known  simulator  Simscript . It  is 
written  in  FORTRAN  IV  and  runs  on  the  TRW  timesharing  system  using  a CDC  6600  computer. 

As  described  in  detail  (in  Section  6.3.1)  no  deadlocks  occurred  within  the  system. 
The  overhead  due  to  the  interface  system  was  on  the  order  of  32  percent  even  though 
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the  network  had  not  been  specifically  optimized  for  the  problem.  With  minor  changes 
overhead  on  the  order  of  5 percent  should  be  achieved. 

6.3.1  Example 


The  problem  which  was  used  for  the  simulation  is  a simplified  SONAR  processor, 
which  is  described  in  detail  below.  A solution  using  20  nodes  and  30  communication 
links  was  simulated  as  described  in  the  following  section. 

SONAR  System  Description 

The  basic  SONAR  processing  example  may  be  understood  by  referring  to  Figure  6-16. 
Data  acquired  by  hydrophones  is  digitized  by  the  A/D  converters,  filtered  through  the 
digital  filters,  transformed  by  the  FFT's,  accumulated  and  averaged  in  the  ensemble 
averaging  unit,  and  finally  displayed.  In  view  of  the  block  orientation  of  our  system, 
the  A/D  converters  are  assumed  to  have  buffers  which  accumulate  256  samples  per  hydro- 
phone channel.  The  samples  here  and  in  the  remainder  of  the  system  are  32-bit  complex 
words  (16  bits  in-phase  and  16  bits  quadrature).  Assuming  a 128  hydrophone  system 
(64  hydrophones  per  A/D  converter)  with  each  hydrophone  sampled  every  0.4  msec  (at  the 
Nyquist  rate  for  1250  Hz)  each  of  the  two  A/D  converters  will  generate  a block  of  data 
every  1.6  msec.  The  digital  filters  are  assumed  capable  of  processing  a block  of  data 
in  25.6  psec  (100  nsec/data  point),  the  FFT  transforms  a block  of  data  in  200  usee, 
and  the  ensemble  averager  is  assumed  to  require  256  psec  per  block  of  data.  The  dis- 
play is  assumed  to  have  access  to  the  ensembles  of  data  on  a noninterfering  basis. 

The  final  parameter  of  interest  is  the  time  required  to  transmit  a block  of  data 
from  one  unit  to  another  - 256  psec. 

Simulated  System 

A block  diagram  of  the  simulated  system  appears  in  Figure  6-17.  Twenty  processors 
(two  A/D  input  units,  three  filters,  13  FFTs,  and  two  ensemble  averagers)  are  inter- 
connected with  30  transmission  links.  No  attempt  was  made  to  optimize  the  network. 
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Figure  6-16.  Basic  SONAR  System 
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Figure  6-17.  Simulated  System 


When  a block  of  data  becomes  available  at  either  A/D  converter  (as  noted  in  the 
previous  section,  each  A/D  generates  a block  of  data  every  1.6  msec)  it  is  sent  to 
two  filters,  or  alternatively  to  the  same  filter  twice;  this  allows  each  block  of  data 
to  be  processed  with  two  cutoff  frequencies.  The  filtered  data  is  then  transformed 
and  finally  sent  to  the  ensemble  averager  which  serves  as  a data  sink. 

Each  processor  was  assumed  to  be  single  buffered  so  that  at  most  one  data  block 
could  be  resident  at  each  processor.  Thus,  the  basic  steps  in  the  flow  of  information 
through  a node  are  to  receive  data,  process  it,  and  send  it  to  the  next  appropriate 
node.  If  a node  is  busy  processing  when  a request  to  send  more  data  to  it  is  received, 
it  replies  with  processor  busy  (PRCBSY). 

If  a multiple  buffer  scheme  is  implemented  at  each  processor,  the  throughput  time 
will  decrease  somewhat.  For  example,  if  each  processor  has  two  buffers,  then  one 
buffer  can  be  filling  while  the  processor  operates  on  the  data  in  the  second  buffer. 
When  the  input  buffer  is  full  and  processing  is  complete,  processing  will  start  on 
the  input  buffer  while  the  data  in  the  other  buffer  is  transmitted  to  the  next  pro- 
cessor. Double  buffering  is  of  greatest  benefit  when  the  transmission  time  is  of  the 
same  order  as  the  processing  time. 
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This  simulation  assumes  a static  topology  which  relieves  the  node  controllers  of 
the  need  to  create  and  update  the  path  matrices.  Thus,  the  tasks  performed  by  a node 
controller  are: 

1)  Executing  path  selection  algorithm 

2)  Constructing  paths  to  other  nodes 

3)  Servicing  requests  to  initiate  or  terminate  connections  between  ports 

4)  Keeping  track  of  busy  ports  at  the  node  and  busy  links  in  the  network. 

Resul ts 

The  SALSIM  simulation  generates  two  types  of  output: 

1)  Action  listings  which  report  what  happened  at  various  simulated  times 

2)  Result  tables  which  summarize  the  observed  characteristics  of  the  system. 

Action  Listings.  The  action  lists  facilitiate  minute  examination  of  the  network 
operation.  There  are  two  forms  of  action  lists:  the  proposed  path  file  and  various 

forms  of  message  trace  lists. 

Proposed  Path  File.  The  proposed  path  file,  which  is  a chronological  list  of 
every  path  proposed  by  all  the  node  controllers,  reports: 

1)  Which  node  is  proposing  the  path 

2)  Which  nodes  and  links  comprise  the  proposed  path 

3)  Which  data  block  is  to  be  sent  along  the  path 

4)  What  time  the  path  was  selected 

5)  Which  data  block  is  presently  using  the  links  in  the  proposed  path  (if  the 

links  are  free  then  the  user  is  "0").  The  list  also  reports  the  node  con- 
troller's success  or  failure  at  completing  the  proposed  path  and  the  reason 

for  the  failure  (e.g.,  which  link  was  busy). 

The  first  entry  in  the  example  proposed  path  file  (Table  6-1)  will  be  used  for 
explanation.  The  proposed  path  is  to  be  used  for  data  block  1,  and  was  selected  at 
10  psec  after  system  startup  (t=0).  The  path  originates  from  node  1 and  leads  to 
node  4 using  nodes  6 and  20  as  intermediate  connections.  The  links  used  are  2 
(between  nodes  1 and  6),  3 (between  nodes  6 and  20),  and  4 (between  nodes  20  and  4). 
None  of  these  links  is  busy  at  this  time  (10  psec)  which  is  indicated  by  the  "0" 
under  each  link  number.  If  some  data  block  had  been  present  on  any  of  the  links 
(2,  3,  or  4),  its  number  would  have  been  placed  in  the  USERS  row  under  the  appro- 
priate link. 

The  fourth  entry  shows  that  the  path  proposed  in  the  first  entry  was  finally 
established  at  25  psec.  Thus,  the  node  controller  took  15  psec  to  acquire  the  pro- 
posed path  and  to  receive  permission  to  use  the  underlying  processor  at  node  4. 


Table  6-1.  Proposed  Path  File 


[ 

| 

I 

i 


RRUPOSEC  “ATM  1 C R It 

NCDLS  1 6 20  A 

LINKS  2 3 A 

USERS  COO 

1 

T IKE. 

.CQOOIO 

RKGPUSEO  PATH  TCP  10 
NODES  2 9 3 

LINKS  11  ?1 

USERS  00 

3 

TIME- 

.000010 

f 

PATH  ESTABLISHED  Fl« 

ID 

3 

TINE. 

.000019 

i 

PATH  ESTABLISH EO  FUR 

IC 

1 

TIME. 

. C 0C025 

PRLFCSHC  PATH  FOR  1C 
NODES  203 

LINKS  15  21 

USERS  0 C 

A 

TIME. 

.000269 

PROPOSED  PATH  FOP  IC 

NCOFS  1 pi  2o  A 

LINKS  2 3 A 

USEPS  000 

K 

2 

T I KF  . 

• C0029A 

PATH  5STA)LISlcr  FOP 

IC 

A 

TIME- 

• 0002  9H 

PATH  PAIL URL  FL»  IC 

A 

TIKE- 

.000300 

8US  V 

PROCESSOR- 

3 

PATH  ESTABLISHED  FOP 

ID 

2 

TIME. 

.000309 

PPOPOSED  PATH  KUP  ID 
NODES  2 12  IS 

LINKS  16  20 

USERS  C 0 

A 

TIME- 

.000312 

PROPOSED  PATH  POP  ID 
NODES  3 9 

LINKS  21 

USERS  0 

3 

TIME- 

.COO  3 1 A 

PATH  ESTABLISHED  FOP 

ID 

3 

T IMF- 

.000316 

PROPOSED  PATH  FOR  ID 
NODES  A lr. 

LINKS  9 

USEPS  0 

1 

TIME* 

.000319 

PATH  ESTABLISHED  FDR 

ID 

A 

TIME- 

. C0032C 

PATH  FAILURE  FOR  ID 

2 

TIME- 

.000322 

BUSY 

PRCCESSCR* 

A 

PATH  ESTABLISHED  FOR 

ID 

1 

TIME- 

.000323 

PROROSEC  PATH  F CP  ID 
NCOFS  1 13  2 9 3 
LINKS  6 1 1 15  21 
LSFPS  0003 

2 

T IKE- 

• 0003  3A 

RATH  FAILURE  FCP  ID 

2 

TIME- 

•0003A9 

BUSY 

LINE-  21 

PPOPOSED  ®ATH  FOP  IC 
NODES  I 13  2 12  15 

LINKS  u 11  It.  20 

L'SEF  S C 0 A A 

2 

TIME- 

.000362 

1 

I 

I 
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The  eighth  entry  in  the  proposed  path  file  indicates  a failure  by  node  controller 
2 to  gain  access  to  the  underlying  processor  at  node  3.  All  the  links  for  the  pro- 
posed path  were  acquired  but  the  processor  is  already  in  use. 

Message  Traces.  The  other  method  for  examining  the  detailed  operation  of  the 
network  is  to  use  the  message  traces.  Three  types  of  message  traces  are  generated 
by  this  simulation:  trace  by  node,  trace  by  message,  and  trace  by  time.  All  the 
message  traces  have  the  same  format  but  are  grouped  according  to  a node,  message,  or 
time.  For  instance,  the  trace  by  node  chart  (Table  6-2)  groups  all  the  messages 
transmitted  and  received  at  a particular  node,  e.g.,  node  1,  and  sorts  them  chrono- 
logically. Thus,  the  message  activity  of  any  node  can  be  examined  in  detail. 

The  trace  by  message  chart  (Table  6-3)  groups  all  the  messages  associated  with 
a particular  data  block  as  it  moves  through  the  system.  Thus,  the  progress  of  a data 
block  generated  at  node  1 (one  of  the  A/D  converters)  at  the  beginning  of  simulation 
can  be  followed  in  the  net  as  it  moves  from  the  source  to  a filter  and  so  on.  Also, 
all  the  commands  and  responses  used  to  generate  the  path  (including  the  partial  paths) 
for  each  data  block  are  included  in  the  trace. 

The  trace  by  time  chart  (Table  6-4)  groups  all  the  messages  in  the  system  accord- 
ing to  the  time  which  they  were  created.  This  trace  is  used  to  examine  the  overall 
interaction  of  the  node  controllers.  Network  characteristics,  such  as  bottlenecks 
and  deadly  embraces,  can  be  seen  quite  easily  by  looking  at  all  the  messages  generated 
in  the  system  at  any  instant. 

Each  line  of  the  message  traces  contains  the  simulation  time  when  the  message  was 
created,  the  message  type,  the  data  block  number,  the  function  and  number  of  the  node 
of  interest,  and  some  housekeeping  data  used  by  the  simulation  program.  The  fourth 
column  of  the  housekeeping  data  on  the  chart  lists  the  destination  node.  For  example, 
the  second  entry  in  the  trace  by  node  chart  shows  that  a GTLNK  message  originates  at 
node  1,  a source  node,  that  it  was  created  at  t=10  usee  for  data  block  1,  and  that  it 
is  being  sent  to  node  6 (column  4).  Column  1 of  the  housekeeping  data  lists  the  node 
from  which  the  response  originated.  For  example,  entry  four  of  the  trace  by  node 
listing  is  a UNICES  response  sent  from  node  6 (column  1)  to  node  1 (column  4)  which 
was  received  at  t= 14  usee. 

Result  Tables.  The  simulation  results  are  given  in  a number  of  tables.  The  four 
most  interesting  are: 

1 ) Throughput  time 

2)  Path  establish  time 

3)  Path  length 

4)  Transmission  type. 
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Table  6-2.  Trace  by  Node 


< O 

;iMl  MLS3Avj£  Q FACTION  ^ r 'JNCT  ION  1 2 3 4 5 b 


.000000 

si* 

1 

SOL  BCE 

1 

LLTPLT 

0 

1 

0 

0 

0 

1 

.000010 

GUNK 

1 

SOLRCE 

1 

TRANSMIT 

1 

1 

6 

6 

2 

1 

.000012 

>7* 

2 

SCLRCE 

1 

GLlPT  LuE 

0 

1 

0 

C 

0 

1 

.0CC019 

L 1 NK  l S 

1 

SCcPCE 

1 

PBGC  f SSOR 

6 

6 

1 

1 

2 

1 

.00001*. 

GUNK 

1 

SOURCE 

1 

TR  ANSPI T 

1 

1 

L 

20 

3 

1 

.000019 

i iNKf  b 

1 

SGI-  P C E 

1 

PROCESSOR 

20 

6 

1 

1 

3 

1 

.000019 

GT  l NK 

1 

SQOPCE 

1 

transmit 

1 

1 

20 

9 

9 

2 

.CGC025 

l INK! s 

1 

SOLBCE 

1 

PROCESSOR 

9 

20 

1 

1 

9 

2 

.000025 

GfcT PRC 

1 

SOURCE 

1 

transmit 

1 

1 

9 

0 

0 

3 

.000026 

pRCRCY 

1 

SOLPCE 

1 

PBCCESSCR 

9 

9 

1 

0 

c 

3 

.000026 

ST* 

l 

SOLBCE 

1 

transmit 

c 

1 

9 

0 

0 

3 

.0002** 

cNCX^T 

I 

SOUPCF 

1 

PBCCISSCP 

0 

c 

0 

0 

0 

l 

.0002*4, 

3F  RINK 

1 

SOLBCE 

1 

transmit 

1 

1 

6 

9 

2 

1 

.0002*4, 

ST* 

2 

SOUBCE 

1 

GLTPUT 

0 

1 

0 

0 

0 

1 

• 0CO299 

GUNK 

2 

SOLBCE 

1 

T R AN  SP  I T 

1 

1 

6 

6 

2 

1 

.000296 

l 1NKF  S 

2 

SOLPCE 

1 

PROCESSOR 

6 

6 

1 

1 

2 

1 

.000296 

GTL  NK 

2 

SOLBCE 

1 

TR ANSPI  T 

1 

1 

6 

20 

3 

1 

.000303 

L lNKtS 

2 

SOLBCE 

1 

PBLCt  SSOP 

20 

6 

1 

1 

3 

1 

.000303 

GUN* 

2 

SOlPCE 

1 

TR  ANSK  IT 

1 

1 

20 

9 

9 

2 

• 0C0  309 

L INKES 

2 

source 

1 

PUCC f SSOP 

9 

20 

1 

1 

9 

2 

.000309 

GE  TPPC 

2 

SOLBCE 

1 

transmit 

1 

1 

9 

0 

0 

3 

.000322 

?RCfc$Y 

2 

SOLBCE 

1 

PROCE  SSCR 

9 

9 

1 

0 

0 

3 

.000322 

3f PLN* 

2 

SdLBCE 

1 

TP ANSPI T 

1 

1 

6 

9 

0 

1 

. COC  3 3*. 

GUNK 

2 

SOLBCE 

1 

transmit 

1 

1 

13 

13 

t 

1 

.000337 

L INKf  S 

2 

SOLBCE 

1 

PROCESSOR 

13 

13 

1 

l 

6 

1 

.000337 

GUNK 

2 

SOLBCE 

1 

TRANSMIT 

3 

1 

1 3 

2 

1 1 

1 

.000392 

L lNKtS 

2 

SCLBCE 

1 

PROCESSOR 

2 

13 

1 

1 

11 

1 

.000392 

GUNK 

2 

SOLRCE 

l 

Trans pit 

1 

1 

2 

9 

15 

2 

. OCO  3 9 7 

L iNKf S 

2 

SOLPCE 

1 

PROCESSOR 

9 

? 

1 

1 

15 

? 

.000397 

GT  INK 

2 

SGLBCE 

1 

TP ANSPI T 

1 

1 

9 

3 

21 

3 

• COC  399 

INK  8SY 

2 

SOLBCE 

1 

PROCESSOR 

9 

9 

1 

1 

21 

1 

.000399 

3 f R L NK 

2 

SCLBCE 

1 

TRANSPIT 

1 

1 

13 

9 

0 

1 

• OOC  3o2 

GUNk 

2 

SOLRCE 

1 

TRANSMIT 

1 

1 

13 

13 

6 

1 

.000365 

LUKES 

2 

SOLRCE 

1 

PROCESSOR 

13 

13 

1 

1 

0 

1 

.000365 

GUNK 

2 

SOLPCE 

1 

transhi t 

1 

l 

13 

2 

11 

1 

• OOC  3 70 

L Ink  FS 

2 

SOURCE 

1 

PROCESSOR 

2 

13 

1 

1 

11 

1 

.000370 

GUNk 

2 

SOLRCE 

1 

transpit 

1 

1 

2 

12 

16 

2 

.000373 

LNKBSY 

2 

SOLRCE 

1 

PROCESSOR 

2 

2 

1 

1 

16 

1 

.000373 

BE  PLNK 

2 

SOLBCE 

1 

transpit 

1 

1 

13 

2 

C 

1 

.000386 

GUNK 

2 

SOLPCE 

1 

transmit 

1 

1 

6 

6 

2 

1 

.000388 

L INKF  s 

2 

SOLRCE 

1 

FRCC  E SSOP 

6 

6 

1 

1 

2 

1 

• OuO  3 3 8 

GUNK 

? 

SOLRCE 

1 

TR  ANSPI  T 

1 

1 

t 

20 

3 

1 

.000393 

L INKES 

2 

SOL  RCE 

1 

PROC  ESSOB 

20 

fc 

l 

1 

3 

1 

.000393 

GUNk 

2 

SOLBCE 

1 

TRANSPIT 

1 

1 

20 

19 

6 

2 

.000396 

L INKf  S 

2 

SOLBCE 

1 

PROCESSOR 

19 

20 

1 

1 

6 

2 

.000396 

GTINK 

2 

SOLBCE 

1 

f R AN  S P I T 

1 

I 

19 

12 

1 7 

3 

.0CC903 

L INK  tS 

2 

SOURCE 

1 

PRCCE  SSOR 

12 

19 

l 

1 

17 

3 

.000903 

GTL  NK 

2 

SOLRCr 

1 

TR  AN S P 1 T 

1 

1 

12 

15 

2 C 

9 

,000906 

LNKBSY 

2 

SOLBCE 

1 

P POLE  SSOR 

12 

12 

1 

1 

20 

1 

.000906 

BF  PLNK 

2 

SOLPCE 

1 

TRANSPIT 

1 

1 

6 

12 

0 

1 

.000919 

GUNK 

2 

SOLBCE 

1 

tpanspit 

1 

1 

6 

6 

2 

1 

.000922 

L 1 NK [ S 

2 

SOLBCE 

2 

PROCESSOR 

6 

6 

1 

1 

2 

1 

.000922 

GTINK 

2 

SOLBCE 

1 

TRANSPIT 

1 

1 

6 

20 

3 

1 

.000927 

L INKES 

? 

SCLBCE 

1 

PROCESSOR 

20 

6 

1 

1 

3 

1 

.00092? 

GUNK 

2 

SOLBCE 

1 

TRANSPIT 

1 

1 

20 

19 

6 

2 

.000932 

L INKES 

2 

SOLPCE 

1 

PROCESSOR 

19 

20 

1 

1 

b 

2 

.000932 

GTINK 

2 

SOLRCE 

1 

TPANSPIT 

1 

1 

19 

19 

18 

3 

.GOC937 

L INKf S 

2 

SOURCE 

1 

PROCESSOR 

19 

19 

1 

1 

18 

3 

.CC0937 

GTL  NK 

2 

SCLKCE 

1 

TR  ANSPI T 

1 

1 

19 

15 

29 

9 

• C 00  9 9 2 

L INKES 

2 

SCLRCE 

1 

PROCESSOR 

15 

19 

1 

i 

29 

9 

•OOC 992 

GFTPPC 

2 

SOUPCF 

1 

TPANSPIT 

1 

1 

15 

0 

0 

5 

.000999 

PkC  85  Y 

2 

SOLRCE 

1 

PROCESSOR 

15 

15 

1 

0 

0 

5 

.000999 

3 F P L NK  • 

2 

SOLPCE 

1 

TP ANSPI  T 

1 

1 

6 

15 

0 

1 

.C0C956 

GUNk 

2 

SOLBCE 

1 

TRANSMIT 

1 

1 

13 

13 

6 

1 

.000969 

L INKES 

2 

SOURCE 

1 

PROCESSOR 

13 

13 

1 

1 

6 

1 

•OOC 9 59 

GTLNk 

2 

SOLBCE  1 TRANSMIT 
Housekeeping  Data 

1 

1 

13 

2 

11 

1 

1.  Originator  5 

2.  Intermediate  node  (if  any) 

3.  Receiver 

4 Destination  6 


li 


The  link  sought  by  a GETLNK  com- 
mand, the  link  established  by  a 
UNICES  cormand,  or  the  link  that 
was  unavailable  (LNKBSY). 

The  number  of  links  between  the 
intermediate  node  and  the  receiver. 
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Table  6-3.  Trace  by  Message 


s 

® HOUSEKEEPING 


TIME 

MESSAGE 

DATA 

FUNCTION 

LU 

8 

Z 

FUNCTION 

1 

2 

3 

4 

5 

6 

O.CO^OOC 

91  X 

) 

SOL  RC  t 

1 

cuieui 

0 

1 

c 

0 

0 

1 

.OOCOIG 

Gil  KM 

1 

SOURCE 

1 

T« AN$K] 1 

1 

1 

6 

6 

2 

1 

.COOOl  3 

GUNK 

1 

FF  T 

6 

FftOCC  5SC» 

1 

1 

t 

6 

2 

1 

. 0CC01 3 

l JN*tS 

1 

FFT 

6 

TRANS’*  IT 

t 

t 

1 

1 

2 

1 

.coooi*. 

LlNKfS 

1 

SOURCE 

1 

PR CC  E S SCR 

t 

6 

l 

1 

2 

1 

.COOOl*. 

OT  INK 

1 

SOCRCE 

1 

TPANSMI  T 

1 

2 

6 

20 

3 

1 

.0000! 5 

GTl  NK 

1 

FFT 

6 

PROCESSOR 

1 

1 

6 

20 

3 

1 

.000015 

GUN* 

1 

FFT 

6 

TP  ANSM  I T 

1 

6 

20 

C 0 

3 

1 

.00001 7 

on  kk 

1 

ISTcCP 

2 C 

PROCESSOR 

1 

6 

20 

20 

3 

1 

.000017 

l JNKfrS 

1 

I NT E OR 

20 

TRANSMIT 

20 

20 

t 

1 

3 

1 

.000018 

L I N K E S 

1 

F F 1 

6 

PROCESSOR 

20 

20 

6 

1 

3 

1 

.OOGOIP 

L iNKES 

1 

f f T 

6 

TRANS’* I T 

20 

6 

1 

1 

3 

1 

.000019 

UNKES 

1 

SOURCE 

i 

PROCESSOR 

20 

*, 

1 

1 

3 

1 

.000019 

GT  L NK 

1 

SOCRCE 

1 

TP  AN SKIT 

1 

1 

20 

A 

A 

2 

.C0CQ2C 

gunk 

I 

TNU-  oa 

20 

FftCCf SSCP 

1 

2 

20 

A 

A 

2 

.000020 

OT  l NK 

1 

INI  EG* 

20 

T P A N 5 r I T 

1 

20 

9 

A 

A 

1 

•G0C022 

GUNK 

1* 

F IU  E» 

*. 

PROCESSOR 

1 

2 C 

*. 

A 

A 

1 

.000022 

UNKLS 

1 

fit TfP 

<» 

tpanski 7 

A 

A 

20 

1 

A 

1 

.00002*. 

L INKE  S 

1 

INTCCR 

20 

PROCESSOR 

A 

A 

20 

1 

A 

1 

.00002*. 

L 1 NK  f S 

1 

I N T fe  G* 

20 

Tft  ANS*  I T 

A 

2C 

1 

1 

A 

2 

.000025 

L INKt  S 

1 

SOURCE 

i 

PROCESSOR 

A 

20 

1 

1 

A 

2 

.C0C025 

01 TPCC 

1 

3 OL  R C p 

1 

TRANSMIT 

1 

1 

<# 

0 

0 

3 

.GOC025 

GE TPkC 

1 

filter 

* 

PROCESSOR 

1 

1 

A 

0 

0 

3 

• C 0002  5 

PfcCPOV 

I 

Fiirfp 

A 

transmit 

A 

A 

1 

0 

0 

3 

. c cco? t 

PPC®OY 

1 

SOURCE 

1 

PROCESSOR 

A 

A 

1 

0 

c 

3 

.000026 

SIX 

1 

SOURCE 

1 

TRANSMIT 

0 

1 

A 

0 

c 

3 

.0002*** 

SIX 

I 

FILTER 

A 

INPUT 

c 

2 

A 

0 

c 

3 

.00028*. 

FNQXMT 

1 

SOURCE 

1 

PROCESSOR 

0 

C 

C 

0 

0 

1 

.00023* 

3TRINK 

1 

SOURCE 

1 

transmit 

1 

1 

6 

A 

2 

1 

.00023*. 

BFRUNK 

I 

FFT 

6 

PROCESSOR 

1 

1 

6 

A 

2 

1 

.00028*. 

3 f P l NK 

1 

FFT 

6 

Tk AnSKIT 

1 

6 

20 

A 

2 

1 

.000236 

3FKINK 

1 

INTECR 

20 

PftCCf  S SCR 

1 

6 

20 

A 

2 

1 

.000236 

3F PIN* 

1 

tNTt  OR 

2C 

transmit 

1 

20 

A 

A 

2 

1 

.000237 

3 ► P L NK 

1 

FILTER 

*. 

PROCESSOR 

1 

20 

A 

A 

2 

1 

.000309 

jIX 

1 

F IL  TER 

A 

OUTPUT 

0 

*. 

0 

0 

0 

3 

.000319 

GUNK 

I 

f It  TfR 

A 

TRANSMIT 

A 

A 

16 

16 

9 

1 

.000322 

G1LNK 

1 

FFT 

le 

PRCCESSCR 

A 

A 

16 

lb 

9 

1 

.000322 

L INKES 

1 

FFT 

n 

transmit 

18 

16 

A 

A 

9 

1 

.000323 

l iNKf < 

1 

filter 

* 

process ca 

ie 

16 

A 

A 

9 

1 

.000323 

GETPPC 

1 

F IL  TER 

t. 

TRANSMIT 

A 

A 

18 

0 

C 

1 

.00032*. 

Gc  T PR C 

1 

FFT 

IB 

PROCESSOR 

A 

A 

It 

0 

0 

1 

.000329 

P R C P 0 Y 

1 

FF  T 

lfi 

TF  ANSMI T 

It 

1 8 

A 

0 

0 

1 

.000325 

PkCPTY 

1 

Fit  TER 

A 

PRCCESSCR 

16 

IP 

A 

0 

0 

1 

.000325 

Six 

1 

FILTER 

A 

TRANSMIT 

0 

A 

i e 

0 

0 

1 

.000682 

ST* 

I 

FFT 

le 
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Table  6-4.  Trace  by  Time 
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The  througnput  time  is  defined  as  the  total  time  which  elapses  from  creation  of 
the  data  block  at  the  source  until  completion  of  the  transmission  of  the  data  block  to 
either  of  the  ensemble  averagers.  The  TIME  column  on  the  throughput  time  table  (Table 
6-5)  quantizes  the  throughput  times  into  500  psec  or  less;  notice  for  example  that  10 
data  blocks  were  processed  through  the  system  with  times  ranging  from  1000  to  1500  psec. 
When  the  expected  value  and  standard  deviation  are  calculated,  the  exact  values  of  the 
throughput  time  (instead  of  the  quantized  values)  are  used. 


Table  6-5. 

T[MF  f sto- 
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Throughput  Time  Distribution 
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The  time  taken  by  a node  controller  to  establish  a proposed  path  is  affected  by 
two  factors:  path  length  and  node  utilization  in  the  proposed  paths.  Node  controllers 

that  must  service  many  link  requests  will  take  longer,  on  the  average,  to  respond  to 
a request.  As  can  be  seen  from  the  path  establish  time  (Table  6-6)  no  paths  were  com- 
pleted in  10  psec  or  less,  whereas  45  paths  were  completed  within  10  to  20  psec  after 
the  initial  path  request  by  the  underlying  processor.  It  was  assumed  that  the  path 
selection  logic  would  always  take  10  psec  to  select  a path,  regardless  of  the  length. 
Thus,  no  path  can  be  acquired  before  the  complete  path  has  been  specified  by  the  path 
selection  logic.  So,  for  each  of  the  45  paths  completed  within  10  to  20  psec,  the 
first  10  psec  was  assumed  spent  in  selecting  the  path;  the  remaining  time  was  utilized 
in  acquiring  the  path  one  link  at  a time.  Partially  completed  paths  are  not  included 
in  the  table. 

Table  6-6.  Path  Establish  Time 
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Table  6-7  lists  the  path  length.  The  type  number  is  the  number  of  links  in  the 
path,  e.g.,  a type  3 path  contains  three  links.  Here  again,  partial  paths  are  not 
included  in  this  table.  This  table  brings  to  light  a very  desirable  feature  of  this 
network.  The  node  controllers  are  programmed  to  use  the  shortest  paths.  Two-thirds 
of  the  paths  involve  only  one  or  two  links. 


Table  6-7.  Path  Length  Frequency 
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The  final  listing  gives  the  transmission  type  (Table  6-8),  which  is  a tabulation 
of  the  frequency  of  use  for  each  type  of  message  in  the  system.  The  type  number  cor- 
responds to  the  following  messages: 


Type  Message 

1 STX 

2 LINKES 

3 LNKBSY 

4 GTLNK 

5 BFRLNK 

6 PRCRDY 

7 GETPRC 

8 (UNUSED) 

9 PRCBSY 

Table  6-8.  Transmit  Type  Frequency 

TYPE  PPFO- 

UENCY 

1 60 

2 860 

3 35 

A 896 

6 520 

6 60 

7 1 A 5 

e o 

9 85 

TOTAL  SAMPLES  2660  E>P.  X VALUE  3.8  STO.  DEV.  1.7 
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Some  of  these  messages  are  used  by  the  simulation  program  to  signal  events  to 
the  executive  routine.  For  example,  a STX  message  is  generated  whenever  an  under- 
lying processor  has  finished  processing  a data  block  and  needs  to  send  the  new  data 
to  another  processor. 

The  most  widely  used  message  is  the  "GTLNK"  (type  4)  since  this  command  is  used 
to  acquire  the  links  for  a path.  Notice  that  a busy  link  is  encountered  less  than  5 
percent  of  the  time  (only  35  LNKBSY  responses,  type  3);  however,  over  half  of  the  time 
the  requested  processor  (requested  by  a GETPRC  command,  type  7)  is  not  available  (85 
PRCBSY  responses,  type  9).  This  is  mainly  due  to  bottlenecks  at  the  ensemble  averagers. 
BFRLNK  commands  proliferate  since  they  are  used  to  free  the  links  when  a transmission 
is  completed  and  to  terminate  unsuccessful  attempts  to  acquire  paths. 

Conclusions 

The  overhead  is  defined  as  the  difference  between  the  actual  throughput  time  and 
the  sum  of  the  throughputs  for  each  required  process: 

Overhead  = Tactual  - Ttheory 

where 

T theory  ~ Tfi1ter  + TFFT  + TEA  + 3Tcomm 

Tgctual  = simulated  actual  delay 

Tfilter  = fiUer  delay 

Tfft  = FFT  delay 

T^  = ensemble  averager  delay 

T = time  required  to  transmit  a data  block  between  processors. 

comm  ^ 

The  amount  of  overhead  incurred  by  a data  block  is  a function  of  the  network  topology 
and  the  number  of  each  type  of  processor  in  the  system.  For  the  system  as  shown, 
overhead  ranged  from  10  to  100  percent  of  the  theoretical  throughput.  The  average 
was  about  30  percent.  A big  bottleneck  in  this  network  is  the  ensemble  averager.  If 
these  units  were  implemented  with  more  input  ports,  better  throughput  would  be  achieved. 

No  attempt  was  made  to  optimize  the  processing  flow.  The  optimum  topology  for 
this  system  would  minimize  the  number  of  links  between  adjacent  processing  stages.  In 
the  network  of  Figure  6-18,  the  probability  of  completing  a path  would  increase  and 
the  time  needed  to  build  a path  would  decrease.  These  improvements  should  reduce  the 
overhead  to  around  6 percent  of  the  theoretical  delay,  neglecting  bottlenecking 
situations. 
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The  "optimum"  topology  for  a signal  processing  system  is  difficult  to  define  due 
to  the  fact  that  each  system  emphasizes  different  features.  The  topology  in  Figure 
6-18  minimizes  throughput  time  but  is  still  subject  to  bottlenecks  at  the  ensemble 
averagers. 


Figure  6-18.  Improved  SONAR  Network 


An  encouraging  result  from  the  simulation  was  the  fact  that  the  node  controllers 
were  idle  much  of  the  time  owing  to  their  high  speed  and  to  the  lack  of  more  data- 
producing  elements  in  the  system.  The  nodes  with  highest  utilization  percentages 
were  trying  to  create  paths  to  the  ensemble  averagers  which  were  already  tied  up 
processing  previous  data.  In  fact,  all  the  data  blocks  go  through  the  system  up  to 
the  ensemble  averagers  with  an  average  overhead  of  6 percent  of  the  throughput  time  up 
to  that  point.  The  addition  of  just  two  more  ensemble  averagers  would  have  resulted 
in  a total  overhead  of  less  than  8 percent  of  the  theoretical  time. 
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Much  of  the  node  controller's  overhead  (over  50  percent)  is  caused  by  execution 
of  the  path  selection  algorithm . In  the  detailed  design  of  the  node  controller,  much 
of  the  design  effort  was  directed  towards  simplifying  and  increasing  the  speed  of 
the  hardware  used  in  the  final  implementation.  Actual  path  creation,  which  involves 
generating  the  appropriate  GTLNK  commands  and  interpreting  the  replies,  accounts  for 
another  35  to  40  percent  of  node  overhead.  The  remaining  time  is  used  to  update  port 
status  tables  and  scratchpad  path  matrix  or  to  generate  responses  to  commands  received 
from  other  nodes. 

6.3.2  Recommendations  for  Future  Work 

Determining  the  optimality  of  a complex  system,  such  as  the  interconnection 
scheme  presented  in  this  report,  is  a difficult  problem.  Verifying  that  the  end 
results  are  acceptable  does  not  prove  that  the  method  used  to  generate  those  results 
is  correct. 

Even  a simulation,  such  as  reported  in  this  section,  serves  only  to  show  that 
the  system  is  free  from  deadlocks  for  the  configuration  which  was  simulated. 

Several  additional  simulations  should  be  performed  to  establish  the  optimum 
network  configuration  for  this  problem.  The  network  can  then  be  perturbed  to  deter- 
mine the  relationship  between  the  overhead  and  the  network  configuration.  Simulations 
should  also  be  performed  for  other  types  of  networks. 

Priority  was  not  incorporated  in  the  simulation  because  all  messages  presumably 
have  equal  priority  in  the  SONAR  problem.  Definition  of  multifunction  networks,  with 
a significant  grouping  of  data  into  separate  priority  classes,  is  necessary  for  such 
a simulation  to  provide  meaningful  results. 

The  other  problem  of  interest  is  the  behavior  upon  startup  of  a transparent 
interface  system  with  adaptive  path  matrices.  The  question  is  whether  the  path 
matrices  will  eventually  converge  to  full  information  about  the  network  structure, 
or  whether  the  matrices  stabilize  before  learning  the  entire  structure.  This  ques- 
tion is  only  of  importance  if  the  system  is  expected  to  change  in  configuration  on  a 
rapid  basis;  otherwise,  fixed  tables  ( i . e . , PROM's)  can  be  used  without  difficulty. 

The  results  to  date  have  been  very  encouraging;  as  indicated,  several  logical 
steps  are  to  be  taken  next.  These  include  additional  simulations  of  increasing  com- 
plexity to  fully  evaluate  the  potential  of  the  transparent  interface  system. 
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6.4  DETAILED  DESIGN  RESULTS 

Previous  sections  have  described  the  algorithms  which  are  to  be  implemented  at 
each  network  node.  In  this  section,  two  chip  designs  are  presented:  the  crosspoint 

switch  and  path  selection  logic.  The  former  is  a 4 x 8 matrix  of  switches  with  inte- 
gral control  logic  to  simplify  the  implementation  of  any  circuit  switching  network. 

The  first  approach  to  realizing  the  path  selection  logic  was  based  on  the  Intel  8080 
microcomputer;  because  it  was  too  slow  a dedicated  logic  implementation  was  performed 
as  described  in  this  section.  It  is  amenable  to  realization  on  a single  LSI  circuit. 

When  the  two  circuits  described  here  are  developed  the  implementation  of  circuit 
switching  networks  for  signal  processor  interconnection  will  be  greatly  simplified. 

6.4.1  Crosspoint  Switch 

An  important  aspect  of  the  system  interface  study  performed  on  this  contract  is 
the  identification  of  hardware  functions  required  for  efficient  implementation  of  the 
connection  algorithms.  Although  several  functions  have  been  identified  the  most  crit- 
ical is  the  crosspoint  switch. 

Crosspoint  switches  are  required  at  each  system  node  which  has  more  than  a single 
data  link.  This  section  describes  the  design  and  use  of  the  crosspoint  switch. 


The  basic  crosspoint  switch,  as  shown  in  Figure  6-19,  consists  of  an  8 x 4 matrix 
of  individual  bilateral  switches  with  the  necessary  logic  to  activate  or  deactivate 
the  switches.  Logic  is  included  to  permit  connecting  or  disconnecting  any  pair  of  the 
eight  ports  and  to  connect  any  of  the  ports  to  any  of  the  four  trunks.  The  latter 
feature  is  used  for  expansion,  as  described  later  in  this  section.  Although  this 
design  is  intended  to  satisfy  the  path  switching  requirement  of  the  transparent  inter- 
face system,  it  is  expected  that  this  device  will  also  satisfy  many  other  requirments 
for  circuit  switching  networks  where  either  analog  or  digital  communication  is  to  be 
implemented. 
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Figure  6-19.  Basic  Crosspoint  Switch 
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Detailed  Design  of  the  Crosspoint  Switch 

The  crosspoint  switch  consists  of  an  8 x 4 array  of  field  effect  transistors 
which  are  controlled  to  connect  selected  pairs  of  the  eight  data  ports.  The  switch 
implemented  is  a four-trunk,  nonhierarchical , eight  member  switch  of  concurrency  four. 
For  eight  port  serial  nodes  a single  chip  is  used;  larger  nodes  (i.e.,  with  more  than 
eight  ports)  are  implemented  by  cascading  crosspoint  switches  both  horizontally  and 
vertically.  A small  amount  of  additional  circuitry  is  required  for  cascading. 

There  are  two  types  of  connections  possible  with  this  chip:  port-to-port  and 

port-to-trunk.  When  performing  a port-to-port  connection,  the  two  port  numbers 
(ranging  from  0 to  7)  are  presented  to  the  crosspoint  switch  simultaneously,  and 
the  trunk  used  for  the  connection  will  be  selected  automatically.  During  a port-to- 
trunk  connection,  the  port  number  and  trunk  number  (ranging  from  0 to  3)  are  pre- 
sented to  the  chip,  thus  specifying  which  switch  in  the  crosspoint  is  to  be  closed. 
This  operation  is  performed  when  a connection  between  two  horizontally  cascaded 
chips  is  desired.  The  terms  horizontal  and  vertical  refer  to  the  port  lines  and 
the  trunks  respectively.  Thus,  an  8 x 4 crosspoint  has  eight  port  lines  and  four 
trunks.  If  it  were  cascaded  in  the  horizontal  direction  using  another  chip  of  the 
same  configuration,  it  would  have  16  lines  and  four  trunks  (16  x 4).  If  the  port 
or  trunk  specified  in  the  port-to-trunk  operation  is  already  in  use,  then  the  busy 
line  is  raised  to  notify  the  requester  that  the  desired  connection  cannot  be  made. 
Similarly,  in  port-to-port  operation,  if  either  port  is  busy  or  if  all  the  trunks 
are  presently  in  use,  then  the  busy  line  will  be  raised,  indicating  a busy  condition. 

There  are  28  functional  pins  on  the  crosspoint  switch:  four  trunk  lines  labeled 

TRUNK0-TRUNK3;  eight  port  lines  (so  named  since  they  originate  from  the  node  ports) 
named  P0RT0-P0RT7;  three  port  A select  lines  (AO,  A1 , and  A2);  three  port  B or  trunk 
select  lines  (BO,  B1 , and  B2);  an  ENAB  line  used  to  enable  the  decoders  for  the  select 
lines;  a PORT-TO-PORT  line  which  is  high  for  a port-to-port  connection  and  low  for  a 
port-to-trunk  connection;  the  CONNECT  line  which  is  used  to  indicate  whether  a make  or 
break  is  desired;  the  busy  line;  and  six  lines  used  for  cascading.  The  detailed  logic 
diagram  for  the  8x4  crosspoint  is  shown  in  Figure  6-20. 

Three  decoders  for  selecting  ports  or  trunks  are  shown  in  the  lower  left-hand 
corner  of  the  drawing.  The  A lines  are  always  decoded  since  there  is  at  least  one 
port  specified  in  either  the  port-to-port  or  port-to-trunk  connection.  The  three  B 
lines  are  decoded  as  the  second  port  in  the  port-to-port  operation,  but  only  80  and  B1 
are  used  in  the  port-to-trunk  operation  since  the  chip  has  only  four  trunks.  Thus, 
the  upper  decoder  is  enabled  for  every  connection,  the  middle  decoder  for  do-  t-to-port 
connections,  and  the  lower  decoder  is  enabled  during  a port-to-trunk  connection. 
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The  latching  switch  is  used  to  make  a connection  between  a port  line  and  a trunk 
but  it  also  contains  a switch  busy  (SBSY)  line.  The  switch  is  turned  on  by  strobing 
PS  and  TS  simultaneously,  which  raises  SBSY  and  turns  the  switches  on,  thus  connect- 
ing the  port  line  (LA)  and  the  trunk  (LB).  The  switch  is  turned  off  by  lowering 
RST,  which  resets  all  the  switches  in  a vertical  column.  All  the  SBSY  lines  for 
each  trunk  are  NORed  together  to  produce  a TRNKi  BSY  line  which  is  used  during 
horizontal  cascading. 

The  system  controller  determines  if  a port  line  is  busy  by  referring  to  the  eight 
status  latches  located  along  the  bottom  of  Figure  6-20.  Each  status  latch  is  simply 
an  R - S latch  which  is  set  if  this  port  is  selected  ( ARBi =1 ) for  a connect  operation 
(CONNECTS)  and  if  the  latch  is  not  already  set  (Q=l).  The  latch  is  reset  if  this 
port  is  selected  (ARBi =1 ) for  a disconnect  operation  (DISCONNECTS ) . (Note  that 
the  disconnect  operation  resets  all  four  switches  associated  with  this  port.)  If  it 
is  a connect  operation  and  the  port  is  already  in  use  (Q=l),  then  PSBSYi  is  forced 
low  which  in  turn  forces  PORT  BUSY  low. 

When  a status  latch  is  set,  a pulse  is  sent  along  the  line  which  connects  all 
four  PS  inputs  together  for  that  port.  However,  a similar  pulse  is  needed  on  the  TS 
line  for  one  of  the  switches  in  order  to  make  a port-to-trunk  connection.  The  set  of 
gates  at  the  left  end  of  each  trunk  line  is  used  to  generate  a TS  pulse.  For  a port- 
to-trunk  operation,  the  middle  ANDN  gate  must  be  used.  This  gate  has  inputs  from 
three  different  sources:  the  DISCONNECT  line,  TRNKi  BSY  line,  and  TRNKi  line.  The 

port-to-port  operation  does  not  specify  which  trunk  is  to  be  used  for  the  connections. 
Therefore,  the  chip  must  make  this  decision.  The  trunk  selection  logic  picks  the 
lowest  numbered,  unused  trunk.  This  logic  is  implemented  with  the  lower  ANON  gate  of 
the  four-gate  cluster  located  at  the  left  end  of  each  trunk.  Using  trunk  two  as  an 
example,  we  see  that  the  inputs  to  the  gate  in  question  are  TRNK2  BSY,  TRNKI  BSY, 

PTPCN  and  ATRBI . This  trunk  will  be  selected  for  a port-to-port  operation  if: 

• It  is  not  currently  being  used  (TRNK2  BSY=0) 

• All  preceding  trunks  (one  and  zero)  are  already  in  use  (TRNKI  BSY  and 
TRNKO  BSY  =0) 

• The  operation  being  performed  is  a port-to-port  operation  (PTPCN=0) 

• The  chips  are  vertically  cascaded,  and  all  the  trunks  in  the  chips 
below  this  particular  chip  are  already  in  use  (ATRBI=0). 

As  can  be  seen  from  Figure  6-20,  trunk  zero  will  always  be  chosen  first  for  a port-to- 
port  connection,  then  trunk  one  and  so  on. 

In  the  event  that  one  or  both  of  the  ports  in  the  port-to-port  operations,  or  the 
port  or  trunk  in  the  port-to-trunk  operation  are  busy,  then  the  busy  flag  is  raised. 
The  three  signals  that  contribute  to  the  generation  of  BUSY  are:  SEL  TRNK  BSY,  which 


6-31 


goes  low  if  the  trunk  specified  in  the  port-to-trunk  operation  is  busy;  ALL  TRNKS  BSV, 
which  goes  low  during  a port-to-port  operation  if  all  trunks  in  every  chip  preceding 
this  one  (in  this  column)  are  being  used  (AT'RBI  =1 ) and  all  four  trunks  in  this  chip 
are  in  use;  and  PORT  BUSY,  which  goes  low  if  any  port  specified  in  either  type  of 
connection  is  presently  occupied. 


Expansion 


A 16  x 8 crosspoint  switch  is  shown  in  Figure  6-21.  Notice  that  all  the  chips  in 
a column  have  their  port  A select  lines  tied  together  and  that  the  port  B select  lines 
for  all  chips  are  tied  together.  With  these  connections  extra  gates  are  needed  to 
ensure  that  the  proper  switches  are  enabled.  In  port-to-port  operation,  all  the 
switches  in  the  column  corresponding  to  the  specified  ports  are  enabled  (e.g.,  when 
connecting  port  9 to  port  13,  chips  21  and  22  must  be  enabled).  For  port-to-trunk 
operation,  all  the  chips  in  the  row  corresponding  to  the  specified  trunk  are  enabled 
(e.g.,  if  trunk  5 is  specified,  then  chips  12  and  22  are  enabled).  It  is  also  neces- 
sary to  connect  the  trunk  busy  lines  for  each  row  to  prevent  two  different  chips  from 
trying  to  use  the  same  trunk  for  two  different  port-to-port  operations. 


When  connecting  two  ports  together  that  do  not  belong  to  the  same  chip  (e.g., 
port  3 to  port  11)  two  port-to-trunk  operations  must  be  performed,  one  for  each  port. 
However,  both  connections  must  be  made  at  the  same  time,  otherwise  the  initial  port- 
to-trunk  operation  will  lock  out  any  future  attempts  to  connect  to  the  specified  trunk. 
With  the  present  chip,  a race  condition  occurs  when  performing  simultaneous  port-to- 
trunk  operations.  There  are  ten  gate  delays  between  the  initiation  of  a port-to- 
trunk  operation  and  the  falling  of  TRNKi  BSY  plus  one  more  gate  delay  to  raise  TRNKi 
BSY  on  the  other  cascaded  chips.  If  TRNKi  BSY  is  raised  before  a TSi  pulse  is  gener- 
ated, then  only  one  port-to-trunk  operation  will  succeed,  locking  out  the  other  desired 
connection.  To  prevent  this,  the  inverter  connected  to  TRNKi  BSY  on  the  right-hand 
side  of  Figure  6-20  could  be  replaced  with  the  circuit  of  Figure  6-22.  The  signals 
LTRNK1  BSY  and  LTRNKi  BSY  would  replace  the  present  TRNKi  BSY,  and  TRNKi  BSY  which 
are  used  in  the  trunk  selection  logic  at  the  left-hand  side  of  Figure  6-20.  The  D 
flip-flop  used  as  the  busy  latch  should  be  rising-edge  triggered  so  that  the  value  of 
TRNKi  BSY  before  the  simultaneous  port-to-trunk  operation  is  used. 

A minor  problem  exists  with  the  cascaded  configuration  of  Figure  6-21  concerning 
BUSY.  Assume  that  a port-to-port  connection  is  desired  for  port  1 and  5 and  that  the 
only  trunk  available  is  trunk  6 (which  is  in  chip  12).  When  chips  11  and  12  are 
enabled,  chip  11  discovers  that  all  its  trunks  are  busy  and  lowers  ATRBO,  thus  sig- 
naling chip  12  to  try  one  of  its  trunks.  Chip  12  will  use  trunk  6 since  it  is  the 
only  trunk  available.  However,  BUSY  will  have  gone  low  because  all  of  the  trunks 
in  chip  11  are  busy.  This  would  indicate  that  the  connection  had  not  been  made  which 
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Figure  6-21.  16  x 8 Crosspoint  Switch 


TRNKi  B$VO 


LTRNKi  BSY 


LTRNKi  BSY 


Figure  6-22.  Latch  for  Expansion  of  Trunk  Busy  Signal 

is,  of  course,  in  error.  One  solution  to  this  problem  is  to  add  the  external  circuit 
which  is  diagrammed  in  Figure  6-23.  These  three  gates  are  all  that  is  needed  for 
any  number  of  switches  cascaded  in  the  same  manner  as  in  Figure  6-21. 


Figure  6-23.  Circuit  for  Expansion  of  Busy  Signal 

Recommendations  for  Future  Work 

In  the  definition  and  detailed  design  of  this  chip  two  minor  problem  areas  have 
been  discovered  as  described  in  the  preceding  section.  Although  solutions  have  been 
described,  it  is  felt  that  further  design  effort  can  yield  "cleaner"  solutions.  A 
complete  logic  simulation  should  also  be  performed  to  verify  the  design  and  to  gener- 
ate test  sequences. 

6.4.2  Path  Selection  Logic 

The  path  selection  algorithm  described  in  Section  6.2  can  be  implemented  with 
either  a microprocessor  (e.g.,  an  Intel  8080)  or  with  dedicated  logic.  In  the  initial 
phase  of  the  study  an  Intel  8080  system  was  investigated  and  is  summarized  here. 
Although  it  established  feasibility  for  the  algorithms,  its  delay  (on  the  order  of 
300  psec  to  select  a path)  was  deemed  excessive. 

A dedicated  hardware  approach  was  investigated  and  is  described  in  detail  in  this 
Section.  It  comprises  two  sections:  path  selection  logic  and  path  selection  controller 
First  order  timing  indicates  that  this  approach  can  select  a path  in  under  10  usee. 

Microprocessor  Implementation 

A node  consists  of  a central  node  controller  and  some  number  of  ports.  At  each 
port  a line  interface  accepts  data  from  a communication  channel  (link)  and  presents  it 
to  the  node  controller.  The  line  interface  also  takes  data  from  the  node  controller 
and  sends  it  over  the  link. 
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The  node  controller  consists  entirely  of  a micro  or  special  purpose  processor 
that  is  conducive  to  stack  and  matrix  operations  (e.g.,  an  Intel  8080)  to  perform  the 
path  matrix  construction  and  the  path  selection  algorithm  (see  Figure  6-24).  However, 
there  will  also  be  a moderate  quantity  of  ancillary  hardware  for  any  processor  such  as 
clock  generators,  RAM's,  ROM,  etc. 
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Figure  6-24.  Node  Controller 


The  line  interface  units  ( igure  6-25)  consist  of  a line  buffer  for  the  communica- 
tion channel,  a UAR/T  for  receiving  and  transmitting  the  data,  two  FIFO's  (one  for 
input  data  and  the  other  for  output  data),  a box  for  detecting  special  messages,  and  a 
box  to  control  the  interface  functions. 

The  line  buffer  consists  mostly  of  line  drivers  and  receivers,  but  will  also  select 
the  output  data  source  (i.e.,  the  UAR/T  or  the  switch).  The  UAR/T  performs  the  data 
detection  and  conversion  from  serial  to  parallel.  It  also  accepts  parallel  output  data, 
converts  it  to  serial,  and  transmits  the  data  with  the  proper  start  and  stop  bits.  The 
two  FIFO  queues  are  used  for  temporarily  storing  multiwo'd  commands  that  are  used  by 
the  nodes  to  establish  paths. 

Each  node  has  the  ability  to  connect  any  port  to  any  other  port.  This  crosspoint, 
nonhierarchical  switch  of  concurrency  M/2  (where  M is  the  number  of  ports)  is  imple- 
mented using  the  crosspoint  switch  described  in  the  previous  section. 

The  control  box  decodes  and  executes  the  control  commands  from  the  processor  and 
senses  the  interface  status  which  is  continually  passed  back  to  the  node  controller. 
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Two  types  of  priority  are  used  through  this  system:  (1)  dynamic  priority,  asso- 

ciated with  a message  sent  by  a node;  and  (2)  static  priority  associated  with  each  port 
in  a node.  When  a node  has  a message  to  send  to  another  member  of  the  network,  it 
generates  a priority  word  for  that  message.  This  priority  is  dynamic  in  two  ways: 

(1)  if  the  links  it  wishes  to  use  are  in  use,  it  can  raise  its  priority  until  those 
links  are  obtained,  or  (2)  if  the  message  it  is  sending  contains  a section  that  is  not 
very  important,  it  can  change  the  priority  during  the  message  with  the  option  of 
raising  the  priority  later  in  the  message. 

Two  kinds  of  priority  conflicts  arise  in  this  system.  The  first  conflict  occurs 
when  a node  controller  is  trying  to  create  a path  using  a link  that  is  presently 
employed  by  a different  node.  The  busy  link  has  a priority  word  stored  in  the  memory 
of  the  node  controller.  This  priority  word  is  compared  with  the  priority  word  of  the 
interfering  node.  If  the  busy  link  has  a higher  priority,  then  a "link  busy"  response 
is  sent  to  the  interface.  If  on  the  other  hand,  the  busy  link  is  in  use  at  lower 
priority,  then  an  interrupt  is  sent  to  the  link  users  informing  them  that  the  link 
will  be  preempted.  After  an  amount  of  time  determined  by  system  protocol,  the  link 
is  then  assigned  to  the  interferor  and  its  priority  word  is  set  to  the  priority  of 
the  new  originator. 
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The  second  kind  of  conflict  occurs  when  two  nodes,  one  on  each  end  of  a free 
link,  want  to  use  the  link  for  a path.  Let  us  suppose  that  both  nodes  act  simulta- 
neously. Each  node  controller  has  generated  a priority  word  for  its  message  and  has 
determined  a path.  Each  sends  out  a "get  link"  command,  with  the  priority  word 
attached,  to  the  other  node.  The  get  link  is  received  and  interpreted  by  each  node 
and  the  priority  words  are  compared.  The  node  with  the  higher  priority  will  send 
a link  busy  response  while  the  node  with  lower  priority  will  respond  with  "link 
established." 

The  other  form  of  priority  implemented  in  the  system  is  the  static  priority 
attached  to  each  port  of  a node.  This  priority  is  established  with  a priority  encoder 
and  can  be  changed  only  by  rewiring.  This  priority  scheme  is  used  by  the  node  control- 
ler to  decide  which  port  to  service  first  in  the  event  that  two  or  more  ports  receive 
messages  simultaneously. 

Dedicated  Logic  Implementation 

The  special  purpose  implementation  consists  of  two  sections:  path  selection  logic 

and  path  selection  controller.  The  logic  is  a direct  implementation  of  the  algorithms 

described  in  Section  6.2.  The  controller  is  based  on  a microprogrammed  sequential  net- 
★ 

work  which  is  a powerful  technique  for  implementing  sequential  control  units. 

Path  Selection  Logic.  The  logic  described  herein  is  an  MSI  implementation  of  the 
path  creation  algorithm.  This  particular  design  can  be  used  with  systems  employing  up 
to  16  nodes.  However,  the  design  is  easily  extended  to  larger  networks. 

The  design  centers  around  five  main  structures:  permanent  path  matrix  and  name 

table  ROM,  scratchpad  path  matrix,  node  selection  logic,  path  stack,  and  threaded  path 
stack.  The  actions  performed  by  these  structures  are  coordinated  by  a microprogrammed 
controller  (described  in  the  next  section  of  this  report).  The  path  selection  hardware 
as  shown  in  Figure  6-26  (see  also  the  simplified  version  of  this  drawing  shown  on 
Figure  6-27),  was  designed  for  a static  network.  For  this  reason,  the  permanent  path 
matrix  (PPM)  is  contained  in  a ROM.  Each  time  a message  is  to  be  transmitted,  the 
scratchpad  path  matrix  RAM  (PMR)  must  be  refreshed.  On  command  from  the  central  con- 
troller, the  information  contained  in  the  lower  half  of  the  ROM  is  transferred  to  the 
PMR.  The  path  matrix  in  the  PMR  describes  the  current  topology  of  the  network  as  seen 
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Figure  6-26.  Detailed  Path  Selection 
Logic  Diagram 


DA  r A 


Block  Diagram  of  Path  Selection  Logic 


from  this  node.  It  would  be  desirable  if  the  path  matrix  was  only  composed  of  links 
that  are  not  busy,  but  the  node  controller  only  learns  that  a given  link  is  busy  when 
it  is  trying  to  complete  a path.  Other  links  in  the  network  may  be  busy  but  not 
recorded  in  the  PMR  since  the  node  controller  has  not  tried  to  use  them  during  the 
current  path  construction.  A node  controller  refreshes  the  PMR  periodically  because 
the  links  that  were  busy  during  former  path  building  attempts  may  now  be  available. 

The  transfer  of  path  matrix  data  from  the  PPM  to  the  PMR  is  usually  done  before  a 
path  is  requested  to  minimize  the  total  time  consumed  by  the  path  selection  logic. 

Each  node  in  the  system  is  identified  by  two  names:  a global  name  and  a local 
name.  The  global  name  is  assigned  by  the  network  designer  and  is  used  for  communica- 
tion between  node  controllers.  The  local  name  is  assigned  by  the  node  controller  and 
is  used  during  path  calculations.  The  node  name  table  (NT)  translates  the  local  names, 
which  are  used  to  address  the  table,  to  the  global  names,  which  are  stored  in  the  upper 
half  of  the  ROM.  Before  the  path  selection  algorithm  can  be  executed,  the  global  name 
of  the  destination  node  is  strobed  into  the  node  wanted  (NW)  register.  Whenever  a 
local  name  is  placed  on  the  path  stack,  the  node  found  (NF)  register  is  loaded  with 
the  contents  from  the  location  in  the  NT  pointed  to  by  that  name.  If  the  NF  register 
matches  the  NW  register,  the  search  is  complete  and  the  path  is  threaded  using  the 
link  field  of  each  stack  word.  However,  if  NF  does  not  equal  NW  then  the  search 
continues. 


Addresses  for  the  PMR  originate  from  two  sources:  PMR  address  register  (SPM)  and 
path  stack  (PS).  When  the  PMR  is  being  refreshed,  the  SPM  register  is  first  cleared 
and  then  incremented  until  every  location  in  the  path  matrix  RAM  has  been  written  into 
from  the  corresponding  location  in  the  PPM.  When  data  is  being  read  from  the  PMR  dur- 
ing the  search  portion  of  the  path  selection  algorithm,  the  SPM  acts  as  a latch  for 
the  local  names  extracted  from  the  PS.  These  names  are  used  to  address  a row  in  the 
scratchpad  path  matrix  which  will  provide  the  next  source  of  interconnection  data. 

One  final  address  source  for  the  PMR  does  not  stem  from  the  path  selection  hardware, 
but  from  the  node  controller.  These  addresses  are  generated  during  the  path  acquisi- 
tion sequence  and  are  used  when  altering  the  scratchpad  path  matrix  to  correspond  with 
the  current  network  topology. 

When  a row  of  the  scratchpad  path  matrix  is  read  from  the  PMR  during  execution  of 
the  path  selection  algorithm,  it  is  presented  to  the  node  selection  logic.  The  bit 
positions  in  a row  from  the  path  matrix  are  numbered  from  left  to  right  as  shown  in 
Figure  6-28. 
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Figure  6-28.  A Row  from  the  Path  Matrix 
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The  bit  positions  correspond  to  the  local  names  given  each  node  by  the  node  controller. 
A "1"  denotes  a connection  between  two  nodes.  For  example,  from  Figure  6-29,  there 
are  eight  links  that  lead  away  from  node  1. 
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Figure  6-29.  Node  i Connectivity 


Thus,  during  path  selection,  all  eight  links  would  be  considered  candidates  for  the 
complete  path.  However,  to  avoid  circular  paths,  no  link  may  be  nominated  more  than 
once  during  the  path  search  operation.  Thus,  the  node  selection  logic  performs  two 
functions:  it  identifies  the  links  that  are  being  considered  for  a path,  using  the 
local  name  of  the  node  at  which  the  link  terminates  (from  the  searching  node's  point 
of  view);  and  it  prevents  a connection  from  being  used  more  than  once  for  any  path  by 
marking  a node  “used"  whenever  it  is  nominated.  The  used  word  (UW)  register  contains 
a used  bit  for  each  node  in  the  network  which  identifies  the  nodes  being  considered  for 
a path.  The  bit  corresponding  to  a candidate  node  is  set  to  "0"  when  that  node  is 
nominated.  All  bits  corresponding  to  nodes  not  under  consideration  remain  at  "1". 

The  UW  register  is  numbered  the  same  as  a row  from  the  path  matrix. 

As  a row  is  read  from  the  PMR,  it  is  placed  in  the  row  register  (RR).  All  16 
used  bits  from  the  UW  register  are  connected  to  the  corresponding  clear  lines  at  the 
RR.  If  a node  has  already  been  included  in  the  path  stack,  its  bit  in  the  RR  will  be 
forced  to  "0"  regardless  of  its  value  in  the  PMR. 

The  outputs  of  the  RR  are  fed  to  a priority  encoder  which  determines  the  position 
of  the  left-most  "1"  and  outputs  the  local  name  assigned  to  that  position.  The  local 
name  is  then  loaded  into  a holding  register  (HR)  to  avoid  losing  this  information  when 
the  final  section  of  the  node  selection  logic  is  enabled.  The  last  section  consists 
of  a 4-1 ine-to-16-1 ine  decoder  whose  outputs  are  connected  to  the  clear  inputs  of 
the  UW.  Thus,  the  four-bit  local  name  energizes  one  of  the  decoder  output  lines  which 
in  turn  sets  the  corresponding  used  bit  to  "0".  The  used  bit  clears  the  RR  bit  and 
the  priority  encoder  determines  the  new  left-most  "1".  This  iteration  continues  until 
every  "1"  in  the  RR  has  been  cleared.  If  the  destination  node  was  not  among  those 
selected  from  the  present  path  matrix  row,  then  a new  row  must  be  processed  by  the 
node  selection  logic. 

The  HR  outputs  are  used  for  two  other  operations  besides  the  node  selection  logic. 
First,  the  local  name  held  in  the  HR  is  used  to  address  the  name  table  to  yield  the 
equivalent  global  name,  which  is  loaded  into  the  NF  register,  and  then  compared  to 
the  NW  register.  Second,  the  latched  local  name  is  positioned  in  the  name  field  of 
the  stack  word. 
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The  path  stack  is  a RAM  chat  is  addressed  by  a pointer  (the  stack  pointer,  STP) 
which  is  incremented  each  time  a word  is  to  be  written  into  memory.  Each  word  in  the 
stack  consists  of  a name  field  and  a link  field  (see  Figure  6-30). 


NAME  FIELD 


LINK  FIELD 


Figure  6-30.  Stack  Word 


The  name  field  is  filled  with  the  local  name  of  a node  while  the  link  field  contains 
a pointer  back  to  a lower  position  in  the  stack.  The  node  specified  by  the  name  field 
of  the  lower  stack  position  was  used  as  a "point-of-search"  from  which  new  candidate 
nodes,  if  any,  could  be  discovered  (by  scanning  the  lower  node's  row  in  the  path 
matrix).  These  new  candidates  would  all  be  linked  back  to  their  "point-of-search" 
node  using  the  stack  position  of  the  point-of-search  node.  For  example  (Figure  6-31), 
if  the  point-of-search  node  "E"  finds  two  new  candidates,  "S"  and  "R",  then  when  they 
are  included  in  the  path  stack  their  link  fields  will  be  equal  to  E‘s  position  in  the 
stack  (which  is  the  current  value  of  the  link  pointer,  i.e.,  2).  When  a new  point-of- 
search  node  is  required,  the  link  pointer  (LKP)  is  incremented. 
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Figure  6-31.  Path  Stack 

During  path  search,  the  addresses  for  the  PS  come  from  STP  which  is  incremented 
each  time  the  stack  gains  a new  member.  However,  once  the  destination  node  has  been 
placed  on  the  stack,  the  path  threading  operation  commences  and  the  addresses  then 
come  from  the  link  fields.  Thus,  if  "S"  had  been  the  destination  node,  the  stack 

address  of  the  next  memler  of  the  path  would  come  from  S's  link  field,  viz.,  2.  E's 

link  field  points  to  D.  w> o in  turn  points  to  G.  Thus,  the  path  would  be  given  as 
G-D-E-S. 

The  threaded  path  stack  (TPS)  is  loaded  with  the  local  names  of  the  nodes  making 
up  the  completed  path.  In  the  example  above,  the  nodes  placed  on  the  TPS  would  be 

S,  E,  D,  and  G in  that  order.  At  this  point,  the  sequential  controller  for  the  path 

selection  hardware  would  notify  the  node  controller  that  a path  has  been  formulated. 
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Path  Selection  Controller.  The  control  functions  required  by  the  path  selection 
hardware  are  implemented  using  a restricted  type  of  Moore  machine  in  which  there  are 
only  two  possible  exits  from  any  state.  The  detailed  schematic  of  the  hardware  is 
shown  on  Figure  6-26.  The  controller  is  in  the  top  left-hand  corner  of  the  drawing. 

The  controller  has  three  basic  parts:  read-only  memory,  state  counter,  and  a 
multiplexer.  There  are  three  types  of  output  signals  from  the  ROM:  control  lines 
which  are  routed  to  the  various  path  selection  chips,  address  lines  which  define  the 
next  state  address  in  the  event  of  a jump,  and  multiplexer  control  lines  which  are 
used  to  select  which  input  variable  is  monitored  to  determine  if  the  jump  is  to  be 
taken. 

The  number  of  bits  in  the  state  counter  are  found  by  taking  the  ceiling  of  the 
log2n,  where  n is  the  number  of  states  in  the  state  diagram  (see  Figure  6-32).  The 
state  diagram  for  the  path  selection  algorithm  uses  22  states,  thus  the  state  counter 
has  hog,,  (22)1  bits,*  i.e.,  5 bits.  This  counter  is  preset  to  the  value  of  the  next 
address  lines  when  implementing  a jump  in  the  state  diagram,  or  it  can  be  cleared  to 
all  zeroes,  which  is  used  as  an  entry  point  to  the  state  diagram.  State  8 of  Figure 
6-32  depicts  a jump  condition.  If  PF  is  true,  then  the  state  counter  is  incremented 
and  state  nine  will  be  executed  next.  However,  if  PF  is  false  (PF  true),  then  the 
next  address  lines  (encoded  with  the  binary  expression  for  17)  are  strobed  into  the 
state  counter  and  state  17  will  be  the  next  executed  state.  The  use  of  state  zero  as 
an  entry  point  is  shown  in  the  bottom  left-hand  corner  of  the  state  diagram.  Note  that 
state  zero  can  be  entered  at  any  time  by  pulsing  the  CLEAR  input  on  the  state  counter. 
In  this  case,  clearing  the  state  counter  would  initiate  the  PMR  refresh  cycle  after 
which  the  controller  would  enter  state  four  and  wait  for  a START  command. 

The  number  of  lines  entering  the  multiplexer  depends  on  the  number  of  conditional 
signals  present  in  the  hardware.  As  can  be  seen  from  the  state  diagram,  there  are 
five  conditionals:  CRYSPM,  START,  PF,  A=B,  and  A<B.  In  actuality,  there  are  two 

more  conditionals,  "1"  and  "0".  The  "1"  conditional  is  used  for  the  situation  in 
which  a jump  always  occurs.  For  example,  the  transition  from  state  21  to  state  6 or 
the  transition  from  state  19  to  state  4.  These  jumps  occur  because  of  loops  in  the 
flow  of  control.  The  "0"  conditional  is  used  for  nonjump  situations,  e.g.,  the  tran- 
sition from  state  6 to  state  7.  This  input  to  the  multiplexer  could  be  eliminated 
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*m  is  the  ceiling  of  X,  i.e.,  tne  smallest  integer  greater  than  or  equal  to  X 
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Figure  S-32.  Sequential  Controller  State  Diagram 


by  using  an  extra  ROM  output  to  enable  the  multiplexer  only  for  jump  situations. 

However,  in  this  case  an  eight-input  multiplexer  will  be  used  so  that  the  extra  input 
is  already  available.  The  number  of  mux  control  lines  needed  can  then  be  calculated 
with  the  same  formula  used  for  finding  the  number  of  bits  in  the  state  counter,  where 
n is  now  the  number  of  input  lines  to  the  mux  instead  of  being  the  number  of  states 
in  the  state  diagram. 

The  number  of  bits  needed  in  the  ROM  can  be  reduced  somewhat  by  storing  the 
control  signal  information  in  an  encoded  form  and  using  n-to-2n  decoders.  For  example, 
if  we  have  four  control  signals  and  each  signal  occurs  during  different  states,  they 
can  then  be  encoded  with  two  bits.  These  two  bits  are  then  decoded  using  a 2-to-4 
decoder  to  yield  the  four  control  signals.  Using  this  technique,  the  number  of  bits 
needed  for  the  control  signals  were  reduced  from  37  to  14. 

The  detailed  flowchart  for  the  controller  for  the  path  selection  algorithm  is 
presented  in  Figure  6-33.  All  the  element  names  (e.g.,  PS,  PMR,  etc.)  are  taken  from 
the  logic  diagram  (Fiqjre  6-26).  There  are  three  main  sections  in  the  flowchart: 
the  first  section  dea.s  with  refreshing  the  scratchpad  version  of  the  path  matrix,  the 
second  section  describes  path  search,  while  the  third  deals  with  path  threading. 

The  scratchpad  path  matrix  (PMR)  is  renewed  by  simply  copying  into  it  the  con- 
tents of  the  permanent  path  matrix  (PPM).  This  is  done  in  four  steps:  read  a word 
from  the  PPM,  write  that  word  into  the  PMR,  increment  the  address  register  (SPM),  and 
check  to  see  if  the  process  is  complete.  The  last  two  operations  are  performed  essen- 
tially at  the  same  time.  When  the  address  register  is  incremented,  carry  out  (CRYSPM) 
is  checked  for  a "1"  which  would  signify  the  end  of  the  refresh  cycle.  If  CRYSPM  is  a 

"0",  then  the  next  word  is  read  from  the  PPM,  etc.  Having  generated  a fresh  copy  of 

the  path  matrix,  the  controller  sits  in  a wait  loop  (state  4)  until  a request  for  a 
path  search  is  received.  | 

When  a path  search  is  finally  requested  (START=1),  the  controller  initializes  the 
hardware  by  clearing  address  registers  SPM  and  STP,  the  link  field  pointer  (LKP),  and 
the  threaded  path  stack  (TPS).  It  also  strobes  the  destination  global  node  name 
into  the  NW  register  and  sets  the  used  word  register  (UW)  to  all  ones. 

It  is  assumed  that  row  zero  in  the  path  matrix  represents  connections  to  the  node 
at  which  the  path  is  being  formulated.  Thus,  clearing  SPM  ensures  that  the  first  nodes 
nominated  as  potential  path  members  will  be  one  branch  away.  Each  bit  in  the  used  word 
register  enables  the  corresponding  bit  in  the  RR.  If  the  UW  bit  is  high,  then  the  RR 
bit  will  be  loaded  from  the  chosen  row  in  the  scratchpad  path  matrix.  However,  if  the 
UW  bit  is  zero,  indicating  that  the  node  associated  with  that  particular  bit  has  al- 
ready been  pushed  on  the  stack,  then  the  RR  bit  would  be  held  at  zero. 
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Figure  6-33.  Sequential  Controller  Flowchart 
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At  this  point,  the  node  selection  logic  is  enabled.  This  set  of  gates  locates 
the  left-most  "1"  in  the  word  held  by  the  row  register  and  decodes  the  position  of 
that  bit  in  the  word  (which  is  the  local  name  for  that  particular  node).  If  the  row 
is  all  zeroes,  the  priority  flag  (PF)  will  remain  at  zero.  Assuming  that  a node  has 
been  found,  the  4-bit  local  name  is  strobed  into  the  holding  register  (HR).  The  con- 
tents of  the  HR  are  sent  to  a 4-line-to-16-line  decoder  whose  outputs  are  used  to 
clear  the  bit  associated  with  that  node  in  the  UW  register.  They  are  also  used  to 
fill  the  node  name  field  of  the  stack  word  and  as  an  address  for  the  name  table  to 
translate  the  local  node  name  into  the  global  node  name  (note  that  the  global  node 
name  does  not  have  to  be  limited  to  4 bits;  however,  if  it  is  greater  than  4 bits, 
the  NW,  NF,  and  comparator  circuit  have  to  be  extended  to  accommodate  the  larger  word 
size) . 

Upon  finding  a connection,  the  stack  pointer  (STP)  is  incremented  and  the  new 
stack  word,  which  is  a concatenation  of  the  contents  of  the  HR  _ id  link  pointer  regis- 
ters, is  loaded  into  the  path  stack.  Once  this  has  been  accomplished,  the  NW  and  NF 
registers  are  compared  to  find  out  if  the  last  node  included  in  the  stack  was  the 
destination  node.  If  it  was  not,  then  the  node  selection  logic  is  enabled  again  in 
hopes  of  finding  another  candidate.  If  the  destination  node  has  been  reached,  then 
the  path  must  be  threaded  and  stored  in  the  TPS. 

When  the  destination  node  has  been  located,  the  STP  register  still  points  to  the 
stack  at  the  destination  node  position.  The  PS  is  then  read  at  this  address  with  the 
results  that  the  node  name  field  is  stored  in  the  threaded  path  stack  while  the  link 
field  is  strobed  into  the  STP  register.  This  process  continues  until  the  link  field 
is  null  (zero),  which  indicates  that  the  complete  path  has  been  stored  in  the  TPS. 

The  node  controller  is  notified  of  the  existence  of  a path  when  PATH  AVAILABLE  is 
raised  by  the  path  selection  controller. 

If  the  row  register  contains  all  zeroes,  which  occurs  when  all  the  candidates 
connected  to  the  point-of-search  node  have  been  included  in  the  PS,  then  the  link 
pointer  is  incremented  and  compared  to  the  stack  pointer.  If  LKP  is  greater  than  STP, 
then  a path  is  impossible  because  all  possible  candidates  have  been  nominated.  In 
this  case,  the  NO  PATH  flag  is  raised  informing  the  node  controller  of  this  unfortu- 
nate situation.  If  LKP  is  less  than  or  equal  to  STP,  then  the  name  of  the  next  point- 
of-search  node  is  read  from  the  stack  and  its  row  in  the  path  matrix  is  checked  for 
new,  unused  candidates. 

Recownendations  for  Future  Work 

The  path  selection  logic  appears  amenable  to  integration  with  the  exception  of 
the  permanent  path  matrix  and  name  table  ROM.  The  controller  implementation  might  be 
greatly  simplified  with  use  of  the  recently  announced  Fairchild  l\  microprogram 
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sequencer.  In  addition  to  assessing  the  impact  of  this  new  product,  a complete 
register  transfer  level  simulation  should  be  performed  to  verify  correct  operation 
prior  to  commencing  the  integration  effort.  In  addition  to  verifying  correct  opera- 
tion of  the  logic  a register  transfer  simulation  greatly  simplifies  the  development 
of  test  methods. 


7.  RECOMMENDATIONS  FOR  FUTURE  WORK 


r 


i 

7 

With  the  study  phase  and  the  first  phase  of  a projected  three  phase  technology 
development  effort  an  accomplished  fact,  the  program  goals  and  objectives  originally 
set  forth  by  the  Naval  Research  Laboratory  and  TRW  remain  unchanged.  Based  on  the 
encouraging  experimental  results  to  date,  the  next  phase  should  be  the  experimental 
verification  of  the  computational  functions  required  for  the  demonstration  vehicle 
to  follow.  No  deviation  from  the  original  plan  is  required. 

Initiation  of  an  effort  to  investigate  and  characterize  in  a radiation  environ- 
ment the  digital  CCD  circuitry  being  developed  is  recommended.  This  effort  should  be 
directed  by  the  Naval  Research  Laboratory;  their  experience  in  this  area  will  be  ex- 
tremely beneficial.  The  beginning  effort  should  be  one  of  determining  the  particular 
characteristics  of  digital  CCD's  in  the  anticipated  environment.  The  differences  in 
design  and  operation  are  likely  to  generate  different  responses  to  radiation  in  the 
digital  devices  compared  to  analog  CCD's.  Having  made  these  measurements,  the  results 
can  then  be  used  to  gauge  what  appropriate  action  should  be  taken  to  improve  the  radia- 
tion resistance  of  the  device.  This  work  should  be^n  as  soon  as  possible.  The  first 
computational  device  array  is  available  from  the  DP-1  test  devices.  Devi'--'  performance 
measurements  should  be  made  before  future  arrays  are  further  developed. 
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